How to use a quadratic regression model?

I'm trying to learn how to fit a quadratic regression model. The dataset can be downloaded at: https://filebin.net/ztr9har5nio7x78v

Be AdjSalePrice the target variable and "SqFtTotLiving","SqFtLot","Bathrooms","Bedrooms","BldgGrade" the predictor variables.

Imagine that SqFtTotLiving will be the variable that has degree 2. Let be python code:

import pandas as pd
import numpy as np
import statsmodels.api as sm
import sklearn


houses = pd.read_csv("house_sales.csv", sep = '\t')#separador é tab

colunas = ["AdjSalePrice","SqFtTotLiving","SqFtLot","Bathrooms","Bedrooms","BldgGrade"]

houses1 = houses[colunas]


X = houses1.iloc[:,1:] ## 
y =  houses1.iloc[:,0] ##

How to fit a quadratic regression model using sklearn and statsmodels ? I I can only use linear regression...

Author: Ed S, 2019-11-06

2 answers

Using only statsmodels:

With statsmodels you can write the desired formula, such as:

target ~ np.power(X1, 2) + X2

In this example, it means that we are looking for the parameters a1 and a2 that best approximate:

target = a1 * X1^2 + a2 * X2

A practical example in your case would be to write the formula and pass the houses.to_dict('list') as data:

import statsmodels.formula.api as sm
import numpy as np

model = sm.ols(formula = 'AdjSalePrice ~ np.power(SqFtTotLiving, 2) + SqFtLot + Bathrooms + Bedrooms + BldgGrade', data = houses.to_dict('list')).fit()

Then to use the trained model, just do:

model.predict({
    "SqFtTotLiving":[20],
    "SqFtLot":[10],
    "Bathrooms":[2],
    "Bedrooms":[4],
    "BldgGrade":[10]
})

I think it is valid to point out that using bias, a column with "1", can help improve the result.

References:


Using only sklearn:

You can generate a polynomial input with preprocessing PolynomialFeatures and then apply a linear regression. This function transforms a vector, such as [x1, x2] into [1, x1, x2, x1^2, x1*x2, x2^2].

from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model

# Entradas exemplo
X = [[0.99, 0.65, 0.35, 0.01], [0.6, 0.01, 0.5, 0.2]]
#   [[  X1,   X2,   X3,   X4]
target = [1, 0]

poly = PolynomialFeatures(degree=2, include_bias=True)
X_polinomial = poly.fit_transform(X)

>>> print(np.round(X_polinomial[0], decimals=3))
 [   1, 0.99, 0.65, 0.35, 0.01,  0.98, 0.644, 0.346,  0.01, 0.423, 0.227, 0.007, 0.122, 0.003,    0.]
#[bias,   X1,   X2,   X3,   X4, X1*X1, X1*X2, X1*X3, X1*X4, X2*X2, X2*X3, X2*X4, X3*X3, X3*X4, X4*X4]
#[   0,    1,    2,    3,    4,     5,     6,     7,      8,    9,    10,    11,    12,    13,    14]

clf = linear_model.LinearRegression()
clf.fit(X_polinomial, target)

To choose which columns want as input, such as using only bias, X1, X2 and x1^2, Just Do:

features_to_use = [0, 1, 2, 5]

clf = linear_model.LinearRegression()
clf.fit(X_polinomial[:, features_to_use], target)

References:

 2
Author: AlexCiuffa, 2019-11-17 22:41:03

Good afternoon, have you tried using r2?

import statsmodels as sm 
from sklearn.metrics import mean_squared_error, mean_absolute_error,  r2_score  
 R2 = r2_score(x) 
 MSE = mean_squared_error(x)--> calculo de erro 

I hope I helped.

 -1
Author: Rodrigo Almeida Bezerra, 2019-11-17 17:03:35