Why should we scale/standardize values of variables and how to reverse this transformation?

When working with prediction algorithms that use multivariates I came across the function scale of the R, which whose purpose is to scale/standardize the values of the variables.

I have no difficulties in using the function scale, but my doubt is specifically conceptual.

Why should I scale the values of my variables? What's the point? Does this make a difference for example in the accuracy of my algorithm's prediction model? And how can I reverse the transformation?

Author: bfavaretto, 2020-01-06

1 answers

Should I scale my entries? The answer is: it depends.

The truth is that scaling your data won't make the result worse, so if in doubt, scale it.

Cases in which to stagger

  1. if the model is based on distance between points, such as clustering algorithms (k-meas) or dimensionality reduction (PCA), then it is necessary to scale/normalize its inputs. See example:

Starting from the data:

    Ano  Preco
0  2000   2000
1  2010   3000
2  1970   2500

A Euclidean distance matrix is:

       0       1       2   
0 [[   0.   1000.05  500.9 ]
1  [1000.05    0.    501.6 ]
2  [ 500.9   501.6     0.  ]]

We observe that the absolute distance of preco dictates what the distance will be, since its absolute value is much greater than the ano. However, when we normalize between [0, 1], the result changes dramatically:

   Ano_norm  Preco_norm
0      0.75         0.0
1      1.00         1.0
2      0.00         0.5

The new Euclidean distance matrix is:

      0    1    2 
0 [[0.   1.03 0.9 ]
1  [1.03 0.   1.12]
2  [0.9  1.12 0.  ]]

Another example, referring to PCA, is this one .

  1. for algorithms such as neural networks (see this reference), which use the descending gradient and activation functions, scaling inputs allows:
    • that only positive features have a negative and a positive part, which facilitates training.
    • prevents any account from returning values like Not a Number during training.
    • if the inputs are on different scales, the weights connected to the inputs are updated at different rates (some faster than others). This harms the learning.

And still normalizing the outputs is important because of the activation function of the last layer.

In this case, to return to the original scale of the output, simply save the values used to normalize and reverse count. Ex:

To normalize:

X_norm = (X - X_min)/(X_max - X_min)

To return the original scale:

X = X_norm * (X_max - X_min) + X_min

Cases where it is not necessary to stagger

  1. cutting algorithms such as Decision Tree and Random Forest.

Other cases

For some algorithms such as linear regression, scaling is not mandatory and does not improve accuracy. Scaling inputs or not will change only the coefficients found. However, since the inputs have different magnitudes (as in the example above ano and preço), the coefficients found can only be compared if the inputs are staggered. That is, if you want interpretability, scale the entries.

 6
Author: AlexCiuffa, 2020-01-07 00:12:02