Creating an array from nested Python elements
Hello everyone
Please help me develop an algorithm.
There is a source list consisting of dictionaries:
[
{
'A':'string1',
'B':'string_5',
'C':'string_9,
...
'N':'stringN',
'metric1':5,
'metric2':7
},
{
'A':'string1',
'B':'string_5',
'C':'string_11,...
'N':'stringN',
'metric1':10,
'metric2':45
},
{
'A':'string2',
'B':'string_7',
'C':'string_15,
...
'N':'stringN',
'metric1':234,
'metric2':78
},
.......
]
At the output, you need to get the following list:
[
{
'A':'string1', 'metric1':sum(all metric1 where 'A'='string1'),
'metric2':sum(all metric2 where 'A'='string1'),
'B': [
{
'B':'string_5',
'metric1':sum(all metric1 for combination where 'A' in
'string1' and 'B' in string_5),
'metric2':sum(all metric2 , where combination Similarly 'metric1'),
'C':[
{
'C':'string_9', 'metric1':sum(all metric_1 for
combination where 'A' in` 'string1' and 'B' in string_5 and 'C' is 'string_9'),
'metric2':sum(...)
},
{
'C':'string_11',
'metric1':sum(...)
'metric2':sum(..)
}
]
}
]
},
{
'A':'string_2', 'metric1':sum(all metric1 where 'A'='string2'),
'metric2':sum(...),
'B': [
{
'B':'string_7','metric1':sum(all metric1 for
combination where 'A'='string_2' and 'B'='string_7'),
'metric2':'sum(...),
'C': [
{
'C':'string_15',
'metric1' :sum(...),
'metric2':'sum(...)
}
]
}
]
},
....
.....
]
That is, the algorithm should do the following: we go through the entire list of dictionaries in order, and group them by the first key (in our case, "A", summing up each of the metrics, and getting a dictionary with keys at the output: - Name of the level - The sum of each of the metrics - Empty list, with the name of the next level. ("B":[])
Then we go through the second level ("B"), group them by the second key, and those elements that have the first level "A" - we enter them in the previously created array "B", again summing all the metrics, and at the output we get a similar dictionary as above, where there is an empty list with the name of the next level ("C"). And so on, until the last level.
I tried using a recursive function that called itself for each element on the bottom but in the end, I got a dictionary that included only the first elements on each level, and I didn't know how to make a recursion for each branch obtained as a result of grouping.
1 answers
A fairly simple code is obtained by grouping dictionaries by their current level value (such as 'string1'
) and simply summing the values of the corresponding metrics for the selected dictionaries, calling the function recursively to get the values for the lower levels:
#!/usr/bin/env python3
from collections import defaultdict
def sum_metrics(dicts, metric_names=('metric1', 'metric2')):
return {metric: sum(d[metric] for d in dicts) for metric in metric_names}
def generate_level_simple(dicts, levels):
if not levels: # no levels
return {} # nothing to generate
level, *levels = levels # pop level
level2dicts = defaultdict(list) # level value -> dicts
for d in dicts:
level2dicts[d[level]].append(d)
return {level: [{level: level_value,
**sum_metrics(level_dicts),
**generate_level_simple(level_dicts, levels)}
for level_value, level_dicts in level2dicts.items()]}
print(generate_level_simple(dicts, order)[order[0]])
Where dicts
is the original list of dictionaries from the question, and order
-this is a list that defines the nesting order of the levels.
Result
[{'A': 'string1',
'B': [{'B': 'string_5',
'C': [{'C': 'string_9', 'metric1': 5, 'metric2': 7},
{'C': 'string_11', 'metric1': 10, 'metric2': 45}],
'metric1': 15,
'metric2': 52}],
'metric1': 15,
'metric2': 52},
{'A': 'string2',
'B': [{'B': 'string_7',
'C': [{'C': 'string_15', 'metric1': 234, 'metric2': 78}],
'metric1': 234,
'metric2': 78}],
'metric1': 234,
'metric2': 78}]
The code works, but it can be inefficient like this how the higher level re-sums the metrics already calculated at the lower levels. This can be avoided by a small change in the code:
def generate_level(dicts, level, levels):
level2dicts = defaultdict(list) # level value -> dicts
for d in dicts:
level2dicts[d[level]].append(d)
if not levels: # the deepest level
return {level: [{level: level_value, **sum_metrics(level_dicts)}
for level_value, level_dicts in level2dicts.items()]}
inner_level, *levels = levels # pop level
inner_dicts = (generate_level(level_dicts, inner_level, levels)
for level_dicts in level2dicts.values())
return {level: [{level: level_value,
**sum_metrics(level_dict[inner_level]),
**level_dict}
for level_value, level_dict in zip(level2dicts, inner_dicts)]}
level, *nested_levels = order
print(generate_level(dicts, level, nested_levels)[level])
In this case, the metrics for dictionaries are summed only from the immediately adjacent level (without unnecessary duplication). The result is the same.
The code uses the Python 3 syntax for decomposing / decompressing lists and dictionaries, introduced in PEP 3132 and PEP 448 (the latter is available since Python 3.5).