Linear regression in various products

Question

Linear regression in various products

I ran a simple regression to a database with a product (product, Volume, Price). It ran perfectly. But I would like to run the same regression on a basis with more products though, I want to be able to choose the product I want to run the regression, see:

Ex.

Produto | Volume | Preço

A

A         

B

B

I want to run regression only on product B.

How to do this?
How to run regression on all products, however, return separately, so that I can analyze them next to each other?

CoD.

import pandas as pd

Pasta1 = pd.ExcelFile ('Pasta2.xlsx')
Daniel = pd.read_excel (Pasta1, 'Tela')


from scipy.stats import linregress

x= Daniel ['Preço']
y= Daniel ['Volume'] 
m, b, R, p, SEm = linregress (x, y)

pd.DataFrame ([m , b, R, p, SEm] , columns=['Valores'] , index=['declive', 
'ordenada_na_origem', 'coeficiente_de_correlação_(de_Pearson)', 'p-value', 
'erro_padrão'])

Result:

Valores

declive: 421.398071 

ordenada_na_origem: 1432.443189 

coeficiente_de_correlação_(de_Pearson): 0.331966 

p-value: 0.000003 

erro_padrão: 86.869651

2

python regressão

Author: Daniel Melo, 2018-01-31

Source

2 answers

With the help of Guto, I solved as follows:

import pandas as pd
import matplotlib.pyplot as plt

Pasta1 = pd.ExcelFile ('Pasta2.xlsx')
Daniel = pd.read_excel (Pasta1, 'Tela')


from scipy.stats import linregress

x= Daniel.loc [(Daniel ['Preço'] > 0) & (Daniel ['Produto'] == 'A')]
x1= x ['Preço']
y= Daniel.loc [(Daniel ['Volume'] > 0) & (Daniel ['Produto'] == 'A')]
y1= y ['Volume']
Produto_A = linregress (x1, y1)


x2= Daniel.loc [(Daniel ['Preço'] > 0) & (Daniel ['Produto'] == 'B')]
x3= x2 ['Preço']
y2= Daniel.loc [(Daniel ['Volume'] > 0) & (Daniel ['Produto'] == 'B')]
y3= y2 ['Volume']
Produto_B = linregress (x3, y3)


pd.DataFrame ([Produto_A, Produto_B] , index=['Valores', 'Valores2'])

Now I just need to find a way to run with more products, without the need to create a block for each product.

0

Author: Daniel Melo, 2018-02-01 11:10:59

score 1 · Accepted Answer

Given what seems to me to be your data, I was able to solve using the attribute .loc from the pandas dataframe.

An example of how I did:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,4),index=list('abadaf'),columns=list('ABCD'))
>>df1
          A         B         C         D
a -0.973031  0.305699  1.330237 -0.799858
b -0.879060  0.238690 -2.729635 -0.457865
a -2.001388  1.058163 -0.328737  0.134416
d  0.994644 -2.305340 -0.714434  0.298462
a -2.242108 -0.331434  0.969981  0.973202
f -0.483833  0.783812  0.925608  0.590251

>>df1.loc['a']
          A         B         C         D
a -0.973031  0.305699  1.330237 -0.799858
a -2.001388  1.058163 -0.328737  0.134416
a -2.242108 -0.331434  0.969981  0.973202

>> df1.loc['a','A']
a   -0.973031
a   -2.001388
a   -2.242108

Here the "product name" is like index. If you want to call the data based on its values( strings or Numbers), you can use .loc along with bolleana expressions :

>> df1 = pd.DataFrame([['a',1,2,3],['b',2,3,4],['a',3,4,5],['c',4,5,6]],index=list('defg'),columns=list('higj'))
>> df1
   h  i  g  j
d  a  1  2  3
e  b  2  3  4
f  a  3  4  5
g  c  4  5  6

>> df1.h=='a'
d     True
e    False
f     True
g    False
Name: h, dtype: bool
>> df1.loc[ df1.h=='a',:]
   h  i  g  j
d  a  1  2  3
f  a  3  4  5
>> df1.loc[ df1.h=='a','i']
d    1
f    3