TypeError: can't pickle thread. local objects when I try to use scikit-learn RFE in a model created in tensorflow

I'm trying to use the scikit-learn library RFE in models I created using tensorflow, but when I try to train I get TypeError: can't pickle _thread._local objects. Follow the code and error below:

import tensorflow as tf
import pandas as pd
from sklearn.feature_selection import RFE

data = {'atributo1':[1,2,3,4,5],'atributo2':[1,2,3,4,5],'atributo3':[1,2,3,4,5],'atributo4':[1,2,3,4,5], 'target':[1,0,1,0,1]}

base = pd.DataFrame(data)

n_hidden1 = 100
n_hidden2 = 50
n_outputs = 2

def create_model():
    model = tf.keras.Sequential([tf.keras.layers.Dense(n_hidden1,activation = 'relu'),
                             tf.keras.layers.Dense(n_hidden2,activation = 'relu'),
                             tf.keras.layers.Dense(n_outputs,activation = 'softmax')])
    model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

    return model

model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model(), batch_size = 10, epochs = 20)
rank = RFE(estimator=model,verbose=1,n_features_to_select=2)
rank.fit(base.drop('target',axis=1),base['target'])

> runfile('C:/Users/panto/.spyder-py3/temp.py', wdir='C:/Users/panto/.spyder-py3')
Traceback (most recent call last):

  File "<ipython-input-5-4d89fbeba90e>", line 1, in <module>
    runfile('C:/Users/panto/.spyder-py3/temp.py', wdir='C:/Users/panto/.spyder-py3')

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/panto/.spyder-py3/temp.py", line 25, in <module>
    rank.fit(base.drop('target',axis=1),base['target'])

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_selection\rfe.py", line 144, in fit
    return self._fit(X, y)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_selection\rfe.py", line 179, in _fit
    estimator = clone(self.estimator)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\base.py", line 64, in clone
    new_object_params[name] = clone(param, safe=False)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\base.py", line 55, in clone
    return copy.deepcopy(estimator)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 150, in deepcopy
    y = copier(x, memo)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 169, in deepcopy
    rv = reductor(4)

TypeError: can't pickle _thread._local objects
Author: Pedro Antonio, 2019-12-17

1 answers

Your code is right, but it is not running and will not run because there is a mismatch between Keras and RFE (recursive feature elimination ) function of sklearn. As can be seen in the RFE documentation :

First, the estimator is trained on the initial set of resources and the importance of each resource is obtained through an attribute coef_ or through an attribute feature_importances_ . Then the features less important are removed from the current set of features. It procedure is repeated recursively on the pruned set until the desired number of resources to be selected is finally accomplished. (Free translation, emphasis mine)

That is, for the RFE to work, the underlying model used must have an attribute called coef_ or one called feature_importances_. Note that this is not the case with KerasClassifier. You can see this using your own code with some modifications. See:

import tensorflow as tf
import pandas as pd
from sklearn.feature_selection import RFE
from sklearn.svm import SVR #esse modulo vai ser importante para o próximo exemplo

data = {'atributo1':[1,2,3,4,5],'atributo2':[1,2,3,4,5],'atributo3':[1,2,3,4,5],'atributo4':[1,2,3,4,5], 'target':[1,0,1,0,1]}

base = pd.DataFrame(data)

n_hidden1 = 100
n_hidden2 = 50
n_outputs = 2

def create_model():
    model = tf.keras.Sequential([tf.keras.layers.Dense(n_hidden1,activation = 'relu'),
                             tf.keras.layers.Dense(n_hidden2,activation = 'relu'),
                             tf.keras.layers.Dense(n_outputs,activation = 'softmax')])
    model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

    return model

X = base.drop('target',axis=1).values
y = base['target'].values
model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model, batch_size = 10, epochs = 20)
model.fit(X, y)

#Mostrar todos os métodos e atributos
print(dir(model))

This is the output, note the absence of the attributes cited in the RFE documentation:

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_keras_api_names', '_keras_api_names_v1', 'build_fn', 'check_params', 'classes_', 'filter_sk_params', 'fit', 'get_params', 'model', 'n_classes_', 'predict', 'predict_proba', 'score', 'set_params', 'sk_params']

Or simply:

print('coef_' in dir(model))
print('feature_importances_' in dir(model))

Output:

False
False

To see that your code works and the problem is Keras, run the same code using a linear SVR model. To do this, just import the module (see code above), and replace the model with:

model = SVR(kernel="linear")
 1
Author: Lucas, 2019-12-18 19:20:59