Python: conditional sum with variable condition

Good Morning.

The task is as follows, I'm trying to make a 'A,b,c' curve of products by company. I have the following df as an example:

df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
               "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
               "vl_total": [100,200,150,50,200,300,400]
               })

To make the abc curve, you need to create a column with the cumulative sum of the total value per company, and get the following value (vl_acm):

df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
               "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
               "vl_total": [100,200,150,50,200,300,400],
               "vl_acm": [100,300,450,50,250,550,950]
               })

In Excel it is quite easy to do this by applying the function sum up and putting the values of the column 'company' as conditional, but in Python I could not find a way to do it.

I thank you guys right now.

Author: Alvaro Baraldi, 2020-06-05

3 answers

You can make this sum using the cumsum method after doing groupby by Company:

df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
               "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
               "vl_total": [100,200,150,50,200,300,400]
               })
df['vl_acm'] = df.groupby('empresa')['vl_total'].cumsum()
df

Out[2]:
    empresa produto vl_total    vl_acm
0   HST     prod1   100         100
1   HST     prod2   200         300
2   HST     prod3   150         450
3   HSC     prod4   50          50
4   HSC     prod5   200         250
5   HSC     prod6   300         550
6   HSC     prod7   400         950
 2
Author: drec4s, 2020-06-05 17:05:14

Friend, I am a beginner, but still I will try to help you because I believe I have found a solution. For sure there are more efficient ways but to help you I thought as follows:

import pandas as pd
df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
               "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
               "vl_total": [100,200,150,50,200,300,400]
               })
filtro = df["empresa", "vl_total"] # Aqui você consegue colocar na váriavel filtro apenas
                                   # as colunhas 'empresa' e 'vl_total'
vl_acm = [] # Cria uma nova lista que será a coluna que você quer atribuir ao df
valores_dos_indices = [] # Cria um vetor para colocar os indices certos para posteriormente
                        # serem colocados os valores nas linhas corretas
for i in range(filtro["empresa"]): # O valor 'i' passará por todas as empresas
    if filtro["empresa"][i] == 'HSC': # Se a empresa correspondente ao indice 'i' for igual ao valor 'HSC'
        vl_acm.append(filtro["vl_total"][i]) # O valor da empressa  correspondente ao indice 'i' será atribuida ao vetor criado 
        valores_dos_indices.append(i)

df["vl_acm"] = [] # Cria uma nova coluna no df, por enquanto vazia
for i in valores_dos_indices: # Percorre todos os indices já marcados anteriormente
    df['vl_acm'].append(vl_acm[i]) # Preenche a nova coluna com os valores correspondentes

Was that more or less what you wanted? If not, explain it to me better so I can help you. Abs

 0
Author: JoaoP_L, 2020-06-05 16:42:53

You need to use pandas.groupby to group by company, and then cumsum to make the cumulative sum for each group (company).

Knowing this is a matter of one line:

df = pd.DataFrame({
    "empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"], 
    "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"], 
    "vl_total": [100,200,150,50,200,300,400]
})

df['vl_acm'] = df.groupby('empresa').cumsum()

DataFrame obtained at the end:

  empresa produto  vl_total  vl_acm
0     HST   prod1       100     100
1     HST   prod2       200     300
2     HST   prod3       150     450
3     HSC   prod4        50      50
4     HSC   prod5       200     250
5     HSC   prod6       300     550
6     HSC   prod7       400     950
 0
Author: jfaccioni, 2020-06-05 18:26:02