How to use a quadratic regression model?
I'm trying to learn how to fit a quadratic regression model. The dataset can be downloaded at: https://filebin.net/ztr9har5nio7x78v
Be AdjSalePrice the target variable and "SqFtTotLiving","SqFtLot","Bathrooms","Bedrooms","BldgGrade" the predictor variables.
Imagine that SqFtTotLiving will be the variable that has degree 2. Let be python code:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import sklearn
houses = pd.read_csv("house_sales.csv", sep = '\t')#separador é tab
colunas = ["AdjSalePrice","SqFtTotLiving","SqFtLot","Bathrooms","Bedrooms","BldgGrade"]
houses1 = houses[colunas]
X = houses1.iloc[:,1:] ##
y = houses1.iloc[:,0] ##
How to fit a quadratic regression model using sklearn and statsmodels ? I I can only use linear regression...
2 answers
Using only statsmodels
:
With statsmodels
you can write the desired formula, such as:
target ~ np.power(X1, 2) + X2
In this example, it means that we are looking for the parameters a1
and a2
that best approximate:
target = a1 * X1^2 + a2 * X2
A practical example in your case would be to write the formula and pass the houses.to_dict('list')
as data
:
import statsmodels.formula.api as sm
import numpy as np
model = sm.ols(formula = 'AdjSalePrice ~ np.power(SqFtTotLiving, 2) + SqFtLot + Bathrooms + Bedrooms + BldgGrade', data = houses.to_dict('list')).fit()
Then to use the trained model, just do:
model.predict({
"SqFtTotLiving":[20],
"SqFtLot":[10],
"Bathrooms":[2],
"Bedrooms":[4],
"BldgGrade":[10]
})
I think it is valid to point out that using bias, a column with "1", can help improve the result.
References:
Using only sklearn
:
You can generate a polynomial input with preprocessing PolynomialFeatures
and then apply a linear regression. This function transforms a vector, such as [x1, x2]
into [1, x1, x2, x1^2, x1*x2, x2^2]
.
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
# Entradas exemplo
X = [[0.99, 0.65, 0.35, 0.01], [0.6, 0.01, 0.5, 0.2]]
# [[ X1, X2, X3, X4]
target = [1, 0]
poly = PolynomialFeatures(degree=2, include_bias=True)
X_polinomial = poly.fit_transform(X)
>>> print(np.round(X_polinomial[0], decimals=3))
[ 1, 0.99, 0.65, 0.35, 0.01, 0.98, 0.644, 0.346, 0.01, 0.423, 0.227, 0.007, 0.122, 0.003, 0.]
#[bias, X1, X2, X3, X4, X1*X1, X1*X2, X1*X3, X1*X4, X2*X2, X2*X3, X2*X4, X3*X3, X3*X4, X4*X4]
#[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
clf = linear_model.LinearRegression()
clf.fit(X_polinomial, target)
To choose which columns want as input, such as using only bias, X1, X2 and x1^2, Just Do:
features_to_use = [0, 1, 2, 5]
clf = linear_model.LinearRegression()
clf.fit(X_polinomial[:, features_to_use], target)
References:
Good afternoon, have you tried using r2?
import statsmodels as sm
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
R2 = r2_score(x)
MSE = mean_squared_error(x)--> calculo de erro
I hope I helped.