How to normalize data? Any sklearn libraries?

Question

How to normalize data? Any sklearn libraries?

I need to normalize data I have so that it is between -1 and 1.

I used StandardScaler, but the range got bigger.

What other sklearn library could you use? There are several in sklearn, but I could not, it should make life easier, but I believe I'm not knowing how to use.

What I tried was:

df = pd.read_fwf('traco_treino.txt', header=None)
plt.plot(df)

Data in the range -4 and 4

After attempting normalization:

from sklearn.preprocessing import StandardScaler  
scaler = StandardScaler()  
scaler.fit(df)
dftrans = scaler.transform(df)
plt.plot(dftrans)

The die is between -10 and 10.

3

python sklearn redes-neurais

Author: Klel, 2018-05-15

Source

1 answers

score 5 · Accepted Answer

O StandardScaler standardizes the data to a unit of variance ( var =1) and not to a range, so the results differ from expected.

To standardize the data in the range (-1, 1), use the MaxAbsScaler:

import numpy as np
from sklearn.preprocessing import MaxAbsScaler

# Define os dados
dados = np.array([[0, 0], [300, -4], [400, 3.8], [1000, 0.5], [3000, 0]], dtype=np.float64)

dados
=> array([[  0.00000000e+00,   0.00000000e+00],
       [  3.00000000e+02,  -4.00000000e+00],
       [  4.00000000e+02,   3.80000000e+00],
       [  1.00000000e+03,   5.00000000e-01],
       [  3.00000000e+03,   0.00000000e+00]])

# Instancia o MaxAbsScaler
p=MaxAbsScaler()

# Analisa os dados e prepara o padronizador
p.fit(dados)
=> MaxAbsScaler(copy=True)

# Transforma os dados
print(p.transform(dados))
=> [[ 0.          0.        ]
 [ 0.1        -1.        ]
 [ 0.13333333  0.95      ]
 [ 0.33333333  0.125     ]
 [ 1.          0.        ]]

More information in documentation or Wikipedia: feature scaling