Linear regression with python
I need to do the linear regression calculation, but I read that there is no possibility to use/install scipy on windows. Is there any other library similar to scipy to perform this type of calculation? Or if there is any way to install scipy on windows, it is also welcome! thank.
2 answers
By what I read on the website of scipy runs on Windows yes.
[instagram] - For most users, especially on Windows, the easiest way is to install any of the packages in the stack SciPy is to download one of these distributions in Python, which will include all of the core packages: Anaconda, Enthought Canopy, Python(x, y,), WinPython, Pyzo
Try one of these:
- Anaconda : One free distribution for the SciPy stack. Compatible with Linux, Windows and Mac.
- Enthought Canopy : free and commercial versions include the main SciPy stack packages. Supports Linux, Windows and Mac.
- Python (x, y) : a free distribution including the SciPy stack, based around the Spyder IDE. Windows Only.
- WinPython : a free distribution including the SciPy stack. Only Windows.
- Pyzo : a free distribution based on the Anaconda and IEP's interactive development environment. Supports Linux, Windows and Mac.
According to own documentation of scipy
it does not run very well in Windows
because it has some dependencies that work only in linux
and mac
As an alternative, I recommend using sklearn, it is a very good lib to work with machine learning, and also has good documentation in addition to several examples.
To install it you can:
pip install -U scikit-learn
Or if you use anaconda:
conda install scikit-learn
I made an example using numpy (for working with arrays) and matplot (for working with graphs) To install:
pip install numpy
python -m pip install -U pip setuptools
python -m pip install matplotlib
In anaconda usually these libs are already installed.
Below follows the example of creating a linear regression with the sklearn
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
#Logica x = x*10 + acc
#acc = acc + 5
#acc inicia em 0
#dataSet treino
#1 - 10 + 0 = 10
#2 - 20 + 5 = 25
#3 - 30 + 10 = 40
#4 - 40 + 15 = 55
x_train = np.array([ [1], [2], [3], [4] ]);
y_train = np.array([ 10, 25, 40, 55 ]);
#dataSet teste
#5 - 50 + 20 = 70
#6 - 60 + 25 = 85
#7 - 70 + 30 = 100
#8 - 80 + 35 = 115
x_test = np.array([ [5], [6], [7], [8] ])
y_test = np.array([ 70, 85, 100, 115 ])
#cria o modelo e faz o treinamento (fit)
model = linear_model.LinearRegression().fit(x_train, y_train)
#exibe algumas informações
print('Coeficientes: \n', model.coef_)
print("Erro médio quadrado: %.2f" % np.mean((model.predict(x_test) - y_test) ** 2))
print('variância de score: %.2f' % model.score(x_test, y_test))
#monta o plot para exibição do resultado
plt.scatter(x_test, y_test, color='black')
plt.plot(x_test, model.predict(x_test), color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
This code will generate a graph this way:
Also follows an example of the sklearn
documentation itself with linear regression applied on diabetes tests: