Creating an array from nested Python elements

Hello everyone

Please help me develop an algorithm.

There is a source list consisting of dictionaries:

[

   {
       'A':'string1',
       'B':'string_5',
       'C':'string_9,
        ...
       'N':'stringN',
       'metric1':5,
       'metric2':7
    },

    {
      'A':'string1',
      'B':'string_5',
      'C':'string_11,...
      'N':'stringN',
      'metric1':10,
      'metric2':45
    },

    {
      'A':'string2',
      'B':'string_7',
      'C':'string_15,
       ...
      'N':'stringN',
      'metric1':234,
      'metric2':78
    },

    .......
 ]

At the output, you need to get the following list:

       [

           {
            'A':'string1', 'metric1':sum(all metric1 where 'A'='string1'), 
            'metric2':sum(all metric2 where 'A'='string1'), 
            'B': [

                   {
                  'B':'string_5',
                  'metric1':sum(all metric1 for combination where 'A' in 
                                          'string1' and 'B' in string_5), 
                   'metric2':sum(all metric2 , where combination Similarly 'metric1'), 
                   'C':[

                         {
                           'C':'string_9', 'metric1':sum(all metric_1 for 
                     combination where 'A' in` 'string1' and 'B' in string_5 and 'C' is 'string_9'),
                           'metric2':sum(...)
                          }, 
                          {
                            'C':'string_11',
                            'metric1':sum(...)
                            'metric2':sum(..)
                          }
                       ]

                   }
                 ]
             },
             {
               'A':'string_2', 'metric1':sum(all metric1 where 'A'='string2'),
              'metric2':sum(...), 
              'B': [

                      {
                        'B':'string_7','metric1':sum(all metric1 for   
                  combination where 'A'='string_2' and 'B'='string_7'), 
                        'metric2':'sum(...),
                        'C': [
                               {
                                 'C':'string_15', 
                                  'metric1' :sum(...), 
                                 'metric2':'sum(...)
                               }
                             ]
                       }
                    ]
               },

         ....

            .....
        ]  

That is, the algorithm should do the following: we go through the entire list of dictionaries in order, and group them by the first key (in our case, "A", summing up each of the metrics, and getting a dictionary with keys at the output: - Name of the level - The sum of each of the metrics - Empty list, with the name of the next level. ("B":[])

Then we go through the second level ("B"), group them by the second key, and those elements that have the first level "A" - we enter them in the previously created array "B", again summing all the metrics, and at the output we get a similar dictionary as above, where there is an empty list with the name of the next level ("C"). And so on, until the last level.

I tried using a recursive function that called itself for each element on the bottom but in the end, I got a dictionary that included only the first elements on each level, and I didn't know how to make a recursion for each branch obtained as a result of grouping.

Author: jfs, 2016-09-11

1 answers

A fairly simple code is obtained by grouping dictionaries by their current level value (such as 'string1') and simply summing the values of the corresponding metrics for the selected dictionaries, calling the function recursively to get the values for the lower levels:

#!/usr/bin/env python3
from collections import defaultdict

def sum_metrics(dicts, metric_names=('metric1', 'metric2')):
    return {metric: sum(d[metric] for d in dicts) for metric in metric_names}

def generate_level_simple(dicts, levels):
    if not levels: # no levels
        return {}  # nothing to generate

    level, *levels = levels # pop level
    level2dicts = defaultdict(list)  # level value -> dicts
    for d in dicts:
        level2dicts[d[level]].append(d)

    return {level: [{level: level_value,
                     **sum_metrics(level_dicts),
                     **generate_level_simple(level_dicts,  levels)}
                    for level_value, level_dicts in level2dicts.items()]}

print(generate_level_simple(dicts, order)[order[0]])

Where dicts is the original list of dictionaries from the question, and order-this is a list that defines the nesting order of the levels.

Result

[{'A': 'string1',
  'B': [{'B': 'string_5',
         'C': [{'C': 'string_9', 'metric1': 5, 'metric2': 7},
               {'C': 'string_11', 'metric1': 10, 'metric2': 45}],
         'metric1': 15,
         'metric2': 52}],
  'metric1': 15,
  'metric2': 52},
 {'A': 'string2',
  'B': [{'B': 'string_7',
         'C': [{'C': 'string_15', 'metric1': 234, 'metric2': 78}],
         'metric1': 234,
         'metric2': 78}],
  'metric1': 234,
  'metric2': 78}]

The code works, but it can be inefficient like this how the higher level re-sums the metrics already calculated at the lower levels. This can be avoided by a small change in the code:

def generate_level(dicts, level, levels):
    level2dicts = defaultdict(list)  # level value -> dicts
    for d in dicts:
        level2dicts[d[level]].append(d)

    if not levels:  # the deepest level
        return {level: [{level: level_value, **sum_metrics(level_dicts)}
                        for level_value, level_dicts in level2dicts.items()]}

    inner_level, *levels = levels  # pop level
    inner_dicts = (generate_level(level_dicts, inner_level, levels)
                   for level_dicts in level2dicts.values())
    return {level: [{level: level_value,
                     **sum_metrics(level_dict[inner_level]),
                     **level_dict}
                    for level_value, level_dict in zip(level2dicts, inner_dicts)]}

level, *nested_levels = order
print(generate_level(dicts, level, nested_levels)[level])

In this case, the metrics for dictionaries are summed only from the immediately adjacent level (without unnecessary duplication). The result is the same.

The code uses the Python 3 syntax for decomposing / decompressing lists and dictionaries, introduced in PEP 3132 and PEP 448 (the latter is available since Python 3.5).

 3
Author: jfs, 2017-04-13 12:53:26