Test and training sample

How do I set the matrix Y_train and Y_test? And how to get rid of NaN?

churn_result = churn_df['TRG_is_churn']
churn_result
y = np.where(churn_result == 'True.',1,0)

Returns an error:

TypeError: invalid type comparison

Author: MaxU, 2018-04-26

1 answers

Having the following DataFrame:

In [71]: df
Out[71]:
  TRG_is_churn
0        True.
1       False.
2          NaN
3     Nonsense
4          NaN
5         None
6          NaN
7          NaN
8        True.

In [72]: df['TRG_is_churn'].isna()
Out[72]:
0    False
1    False
2     True
3    False
4     True
5    False
6     True
7     True
8    False
Name: TRG_is_churn, dtype: bool

A comparison with 'True.' will return a boolean series (Series)

In [73]: df['TRG_is_churn'] == 'True.'
Out[73]:
0     True
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8     True
Name: TRG_is_churn, dtype: bool

Which can be converted to an integer:

In [74]: y = (df['TRG_is_churn'] == 'True.').astype(np.int8)

In [75]: y
Out[75]:
0    1
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    1
Name: TRG_is_churn, dtype: int8

The same thing in the form of a Numpy array:

In [76]: y = (df['TRG_is_churn'] == 'True.').astype(np.int8).values

In [77]: y
Out[77]: array([1, 0, 0, 0, 0, 0, 0, 0, 1], dtype=int8)

PS try to avoid loops when working with Pandas/Numpy/Scipy/sklearn/etc.

 1
Author: MaxU, 2018-04-30 14:40:51