Test and training sample
How do I set the matrix Y_train
and Y_test
? And how to get rid of NaN
?
churn_result = churn_df['TRG_is_churn']
churn_result
y = np.where(churn_result == 'True.',1,0)
Returns an error:
TypeError: invalid type comparison
2
1 answers
Having the following DataFrame:
In [71]: df
Out[71]:
TRG_is_churn
0 True.
1 False.
2 NaN
3 Nonsense
4 NaN
5 None
6 NaN
7 NaN
8 True.
In [72]: df['TRG_is_churn'].isna()
Out[72]:
0 False
1 False
2 True
3 False
4 True
5 False
6 True
7 True
8 False
Name: TRG_is_churn, dtype: bool
A comparison with 'True.'
will return a boolean series (Series)
In [73]: df['TRG_is_churn'] == 'True.'
Out[73]:
0 True
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 True
Name: TRG_is_churn, dtype: bool
Which can be converted to an integer:
In [74]: y = (df['TRG_is_churn'] == 'True.').astype(np.int8)
In [75]: y
Out[75]:
0 1
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 1
Name: TRG_is_churn, dtype: int8
The same thing in the form of a Numpy array:
In [76]: y = (df['TRG_is_churn'] == 'True.').astype(np.int8).values
In [77]: y
Out[77]: array([1, 0, 0, 0, 0, 0, 0, 0, 1], dtype=int8)
PS try to avoid loops when working with Pandas/Numpy/Scipy/sklearn/etc.
1
Author: MaxU, 2018-04-30 14:40:51