词典字典:共享至少两个公共键的打印词典

||
d = {\'g1\':{\'p1\':1,\'p2\':5,\'p3\':11,\'p4\':1},
     \'g2\':{\'p1\':7,\'p3\':1,\'p4\':2,\'p5\':8,\'p9\':11},
     \'g3\':{\'p7\':7,\'p8\':7},
     \'g4\':{\'p8\':9,\'p9\':1,\'p10\':7,\'p11\':8,\'p12\':3},
     \'g5\':{\'p1\':4,\'p13\':1},
     \'g6\':{\'p1\':4,\'p3\':1,\'p6\':2,\'p13\':1}
    }
对于给定的词典\'d \',我想返回共享至少两个(\'n \')键(存在于给定群集的所有子词典中)的子词典的群集。在这里,我们不在乎这些子词典的值。换句话说,给定群集中所有子词典的键的交集长度应至少为两个(或'n')。     
已邀请:
        我希望我正确理解了你想要什么。这种方法笨拙,我担心它的效率很低。 我向d添加了字典g6以便产生更有趣的输出:
#! /usr/bin/env python
# -*- coding: utf-8 -*-

d = {\'g1\':{\'p1\':1,\'p2\':5,\'p3\':11,\'p4\':1},
     \'g2\':{\'p1\':7,\'p3\':1,\'p4\':2,\'p5\':8,\'p9\':11},
     \'g3\':{\'p7\':7,\'p8\':7},
     \'g4\':{\'p8\':9,\'p9\':1,\'p10\':7,\'p11\':8,\'p12\':3},
     \'g5\':{\'p1\':4,\'p13\':1},
     \'g6\':{\'p1\':1,\'p9\':2,\'p11\':12}
    }

clusters = {}

for key, value in d.items ():
    cluster = frozenset (value.keys () )
    if cluster not in clusters: clusters [cluster] = set ()
    clusters [cluster].add (key)


for a in clusters.keys ():
    for b in clusters.keys ():
        if len (a & b) > 1 and a ^ b:
            cluster = frozenset (a & b)
            if cluster not in clusters: clusters [cluster] = set ()
            for x in clusters [a]: clusters [cluster].add (x)
            for x in clusters [b]: clusters [cluster].add (x)

print \"Primitive clusters\"
for key, value in filter (lambda (x, y): len (y) == 1, clusters.items () ):
    print \"The dictionary %s has the keys %s\" % (value.pop (), \", \".join (key) )

print \"---------------------\"
print \"Non-primitive clusters:\"
for key, value in filter (lambda (x, y): len (y) > 1, clusters.items () ):
    print \"The dictionaries %s share the keys %s\" % (\", \".join (value), \", \".join (key) )
    
        我认为您应该先“倒置”字典,然后找到解决方案很容易:
import collections
inverted = collections.defaultdict(list)

for key, items in d.items():
    for sub_key in items:
        inverted[sub_key].append(key)

for sub_key, keys in inverted.items():
    if len(keys) >= 2:
        print sub_key, keys
    
        就像是
for keya in d:
    tempd = {}
    keys = set()
    tempset = set(d[keya].keys())

    for keyb in d:
        tempset &= d[keyb].keys()

        if len(tempset) >= 2:
            keys.add(keyb)

    print({key: d[key] for key in keys})
可能会工作。 编辑:不,不是很有效。我需要考虑一下。     
        如果将问题简化为仅长度为2的簇(即,成对的字典),它将变得更加清晰:从给定的可迭代对象生成固定长度的子序列正是itertools.combinations的工作:
>>> list(itertools.combinations(d, 2))
[(\'g5\', \'g4\'), (\'g5\', \'g3\'), (\'g5\', \'g2\'), (\'g5\', \'g1\'), (\'g4\', \'g3\'), (\'g4\', \'g
2\'), (\'g4\', \'g1\'), (\'g3\', \'g2\'), (\'g3\', \'g1\'), (\'g2\', \'g1\')]
通过意识到视图d.keys()的行为类似于集合(在Python 3中;在Python 2中,它可能是一个列表),我们可以看到任何词典共有的键数:
>>> d[\'g1\'].keys() & d[\'g2\'].keys()
{\'p3\', \'p1\', \'p4\'}
&是集合的交集运算符-它为我们提供了这些集合共有的所有项目的集合。因此,我们可以通过检查此集合的长度来检查其中是否有至少两个,从而得出:
>>> common_pairs = [[x,y] for x,y in itertools.combinations(d, 2)
                                   if len(d[x].keys() & d[y].keys()) >= 2]
>>> common_pairs
[[\'g2\', \'g1\']]
解决未知的群集大小会稍微困难一些-如果我们不对此进行硬编码,则无法直接使用&运算符。幸运的是,set类为我们提供了一种以set.intersection形式获取n个集合的交集的方法。它不会接受dict_keys实例,但是您可以通过调用set来轻松解决该问题:
>>> set.intersection(d[\'g1\'].keys(), d[\'g2\'].keys(), d[\'g5\'].keys())
Traceback (most recent call last):
  File \"<stdin>\", line 1, in <module>
TypeError: descriptor \'intersection\' requires a \'set\' object but received a \'dict_keys\'
>>> set.intersection(set(d[\'g1\']), set(d[\'g1\']), set(d[\'g5\']))
{\'p1\'}
您应该能够相当轻松地将其概括为大小为2到n的群集。     

要回复问题请先登录注册