Why should we scale/standardize values of variables and how to reverse this transformation?
When working with prediction algorithms that use multivariates I came across the function scale
of the R, which whose purpose is to scale/standardize the values of the variables.
I have no difficulties in using the function scale
, but my doubt is specifically conceptual.
Why should I scale the values of my variables? What's the point? Does this make a difference for example in the accuracy of my algorithm's prediction model? And how can I reverse the transformation?
1 answers
Should I scale my entries? The answer is: it depends.
The truth is that scaling your data won't make the result worse, so if in doubt, scale it.
Cases in which to stagger
- if the model is based on distance between points, such as clustering algorithms (k-meas) or dimensionality reduction (PCA), then it is necessary to scale/normalize its inputs. See example:
Starting from the data:
Ano Preco
0 2000 2000
1 2010 3000
2 1970 2500
A Euclidean distance matrix is:
0 1 2
0 [[ 0. 1000.05 500.9 ]
1 [1000.05 0. 501.6 ]
2 [ 500.9 501.6 0. ]]
We observe that the absolute distance of preco
dictates what the distance will be, since its absolute value is much greater than the ano
. However, when we normalize between [0, 1], the result changes dramatically:
Ano_norm Preco_norm
0 0.75 0.0
1 1.00 1.0
2 0.00 0.5
The new Euclidean distance matrix is:
0 1 2
0 [[0. 1.03 0.9 ]
1 [1.03 0. 1.12]
2 [0.9 1.12 0. ]]
Another example, referring to PCA, is this one .
- for algorithms such as neural networks (see this reference), which use the descending gradient and activation functions, scaling inputs allows:
- that only positive features have a negative and a positive part, which facilitates training.
- prevents any account from returning values like
Not a Number
during training. - if the inputs are on different scales, the weights connected to the inputs are updated at different rates (some faster than others). This harms the learning.
And still normalizing the outputs is important because of the activation function of the last layer.
In this case, to return to the original scale of the output, simply save the values used to normalize and reverse count. Ex:
To normalize:
X_norm = (X - X_min)/(X_max - X_min)
To return the original scale:
X = X_norm * (X_max - X_min) + X_min
Cases where it is not necessary to stagger
- cutting algorithms such as Decision Tree and Random Forest.
Other cases
For some algorithms such as linear regression, scaling is not mandatory and does not improve accuracy. Scaling inputs or not will change only the coefficients found. However, since the inputs have different magnitudes (as in the example above ano
and preço
), the coefficients found can only be compared if the inputs are staggered. That is, if you want interpretability, scale the entries.