Python: conditional sum with variable condition
Good Morning.
The task is as follows, I'm trying to make a 'A,b,c' curve of products by company. I have the following df as an example:
df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
"produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
"vl_total": [100,200,150,50,200,300,400]
})
To make the abc curve, you need to create a column with the cumulative sum of the total value per company, and get the following value (vl_acm):
df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
"produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
"vl_total": [100,200,150,50,200,300,400],
"vl_acm": [100,300,450,50,250,550,950]
})
In Excel it is quite easy to do this by applying the function sum up and putting the values of the column 'company' as conditional, but in Python I could not find a way to do it.
I thank you guys right now.
3 answers
You can make this sum using the cumsum
method after doing groupby
by Company:
df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
"produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
"vl_total": [100,200,150,50,200,300,400]
})
df['vl_acm'] = df.groupby('empresa')['vl_total'].cumsum()
df
Out[2]:
empresa produto vl_total vl_acm
0 HST prod1 100 100
1 HST prod2 200 300
2 HST prod3 150 450
3 HSC prod4 50 50
4 HSC prod5 200 250
5 HSC prod6 300 550
6 HSC prod7 400 950
Friend, I am a beginner, but still I will try to help you because I believe I have found a solution. For sure there are more efficient ways but to help you I thought as follows:
import pandas as pd
df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
"produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
"vl_total": [100,200,150,50,200,300,400]
})
filtro = df["empresa", "vl_total"] # Aqui você consegue colocar na váriavel filtro apenas
# as colunhas 'empresa' e 'vl_total'
vl_acm = [] # Cria uma nova lista que será a coluna que você quer atribuir ao df
valores_dos_indices = [] # Cria um vetor para colocar os indices certos para posteriormente
# serem colocados os valores nas linhas corretas
for i in range(filtro["empresa"]): # O valor 'i' passará por todas as empresas
if filtro["empresa"][i] == 'HSC': # Se a empresa correspondente ao indice 'i' for igual ao valor 'HSC'
vl_acm.append(filtro["vl_total"][i]) # O valor da empressa correspondente ao indice 'i' será atribuida ao vetor criado
valores_dos_indices.append(i)
df["vl_acm"] = [] # Cria uma nova coluna no df, por enquanto vazia
for i in valores_dos_indices: # Percorre todos os indices já marcados anteriormente
df['vl_acm'].append(vl_acm[i]) # Preenche a nova coluna com os valores correspondentes
Was that more or less what you wanted? If not, explain it to me better so I can help you. Abs
You need to use pandas.groupby
to group by company, and then cumsum
to make the cumulative sum for each group (company).
Knowing this is a matter of one line:
df = pd.DataFrame({
"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
"produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
"vl_total": [100,200,150,50,200,300,400]
})
df['vl_acm'] = df.groupby('empresa').cumsum()
DataFrame obtained at the end:
empresa produto vl_total vl_acm
0 HST prod1 100 100
1 HST prod2 200 300
2 HST prod3 150 450
3 HSC prod4 50 50
4 HSC prod5 200 250
5 HSC prod6 300 550
6 HSC prod7 400 950