Linear regression in various products
I ran a simple regression to a database with a product (product, Volume, Price). It ran perfectly. But I would like to run the same regression on a basis with more products though, I want to be able to choose the product I want to run the regression, see:
Ex.
Produto | Volume | Preço A A B B
I want to run regression only on product B.
How to do this?
How to run regression on all products, however, return separately, so that I can analyze them next to each other?
CoD.
import pandas as pd
Pasta1 = pd.ExcelFile ('Pasta2.xlsx')
Daniel = pd.read_excel (Pasta1, 'Tela')
from scipy.stats import linregress
x= Daniel ['Preço']
y= Daniel ['Volume']
m, b, R, p, SEm = linregress (x, y)
pd.DataFrame ([m , b, R, p, SEm] , columns=['Valores'] , index=['declive',
'ordenada_na_origem', 'coeficiente_de_correlação_(de_Pearson)', 'p-value',
'erro_padrão'])
Result:
Valores
declive: 421.398071
ordenada_na_origem: 1432.443189
coeficiente_de_correlação_(de_Pearson): 0.331966
p-value: 0.000003
erro_padrão: 86.869651
2 answers
Given what seems to me to be your data, I was able to solve using the attribute .loc
from the pandas dataframe.
An example of how I did:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(6,4),index=list('abadaf'),columns=list('ABCD'))
>>df1
A B C D
a -0.973031 0.305699 1.330237 -0.799858
b -0.879060 0.238690 -2.729635 -0.457865
a -2.001388 1.058163 -0.328737 0.134416
d 0.994644 -2.305340 -0.714434 0.298462
a -2.242108 -0.331434 0.969981 0.973202
f -0.483833 0.783812 0.925608 0.590251
>>df1.loc['a']
A B C D
a -0.973031 0.305699 1.330237 -0.799858
a -2.001388 1.058163 -0.328737 0.134416
a -2.242108 -0.331434 0.969981 0.973202
>> df1.loc['a','A']
a -0.973031
a -2.001388
a -2.242108
Here the "product name" is like index
. If you want to call the data based on its values( strings or Numbers), you can use .loc
along with bolleana expressions :
>> df1 = pd.DataFrame([['a',1,2,3],['b',2,3,4],['a',3,4,5],['c',4,5,6]],index=list('defg'),columns=list('higj'))
>> df1
h i g j
d a 1 2 3
e b 2 3 4
f a 3 4 5
g c 4 5 6
>> df1.h=='a'
d True
e False
f True
g False
Name: h, dtype: bool
>> df1.loc[ df1.h=='a',:]
h i g j
d a 1 2 3
f a 3 4 5
>> df1.loc[ df1.h=='a','i']
d 1
f 3
With the help of Guto, I solved as follows:
import pandas as pd
import matplotlib.pyplot as plt
Pasta1 = pd.ExcelFile ('Pasta2.xlsx')
Daniel = pd.read_excel (Pasta1, 'Tela')
from scipy.stats import linregress
x= Daniel.loc [(Daniel ['Preço'] > 0) & (Daniel ['Produto'] == 'A')]
x1= x ['Preço']
y= Daniel.loc [(Daniel ['Volume'] > 0) & (Daniel ['Produto'] == 'A')]
y1= y ['Volume']
Produto_A = linregress (x1, y1)
x2= Daniel.loc [(Daniel ['Preço'] > 0) & (Daniel ['Produto'] == 'B')]
x3= x2 ['Preço']
y2= Daniel.loc [(Daniel ['Volume'] > 0) & (Daniel ['Produto'] == 'B')]
y3= y2 ['Volume']
Produto_B = linregress (x3, y3)
pd.DataFrame ([Produto_A, Produto_B] , index=['Valores', 'Valores2'])
Now I just need to find a way to run with more products, without the need to create a block for each product.