The simplest implementation of the linear regression algorithm. What was I wrong about?

Question

The simplest implementation of the linear regression algorithm. What was I wrong about?

I am implementing a linear regression algorithm based on two parameters. When the dataset is increased by an order of magnitude (from 10 to 100), the parameter values fly into space. Is this my fault or the limitations of this implementation?

Code:

import matplotlib.pyplot as plt
import random


def lin_regr(param_x, param_y):
    """
    Градиентный спуск
    :param param_x: вектор значений датасета по оси x
    :param param_y: вектор значений датасета по оси y
    :return: оптимизированные значения параметров th0 и th1
    """
    th0, th1, alfa, eps = 50, 15, 0.002, 1e-6
    tmp0, tmp1 = 1, 1
    while abs(tmp0) >= eps or abs(tmp1) >= eps:
        tmp0 = alfa * derriv(param_x, param_y, th0, th1, '0')
        tmp1 = alfa * derriv(param_x, param_y, th0, th1, '1')
        th0 -= tmp0
        th1 -= tmp1
    return th0, th1


def derriv(vect_x, vect_y, t0, t1, str):
    """
    Производная от MSE
    :param vect_x: вектор значений датасета по оси x
    :param vect_y: вектор значений датасета по оси y
    :param t0: первый параметр
    :param t1: второй параметр
    :param str: строка. Отвечает за выбор производной (для t0 не нужно домнажать на x)
    :return: значение производной MSE в точке (t0, t1)
    """
    sum = 0
    for x, y in zip(vect_x, vect_y):
        if str is '1':
            sum += (t0 + x * t1 - y) * x
        else:
            sum += (t0 + x * t1 - y)
    return sum / len(vect_x)



if __name__ == '__main__':
    x = [100 * random.random() for i in range(30)]
    y = [100 * random.random() for i in range(30)]
    t0, t1 = lin_regr(x, y)
    plt.plot(*zip(x, y), marker='o', color="r", ls="")
    plt.plot([t0 + t1 * i for i in range(10)])
    plt.show()

0

python python-3.x математика машинное-обучение

Author: insolor, 2019-10-13

Source

1 answers

score 2 · Accepted Answer

A very strange statement of the problem. Regression is the identification of the DEPENDENCE of one variable on another. And what is your dependency if both variables are generated independently of each other??? Where does the poor hailstone go down???

Well, set your x b y for example as

x=random.random()
у=A*x + D + random.gauss(mu, sigma)

And you will be happy. Just select mu and sigma so that the task makes sense.

Well, to catch up-look for regression coefficients in the case of one-dimensional (!) linear (!!!) gradient descent regression is actually a perversion.