ValueError: could not convert string to float: 'red'
Hello, I'm trying to make a model for white and red wine decision, this is my code:
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
np.random.seed(2)
# number of wine classes
classifications = 2
# load dataset
dataset = np.loadtxt('/content/wine.csv', delimiter=",")
# split dataset into sets for testing and training
X = dataset[:,1:12]
Y = dataset[:,0:1]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.66, random_state=5)
# convert output values to one-hot
y_train = keras.utils.to_categorical(y_train-1, classifications)
y_test = keras.utils.to_categorical(y_test-1, classifications)
# creating model
model = Sequential()
model.add(Dense(10, input_dim=13, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(classifications, activation='softmax'))
# compile and fit model
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=15, epochs=2500, validation_data=(x_test, y_test))
My csv code is kind of big so I'll put only a few columns:
7.4,0.7,0,1.9,76,11,34,9.978,3.51,0.56,9.4,5,red
7.8,0.88,0,2.6,98,25,67,9.968,3.2,0.68,9.8,5,red
7.8,0.76,0.04,2.3,92,15,54,997,3.26,0.65,9.8,5,red
11.2,0.28,0.56,1.9,75,17,60,998,3.16,0.58,9.8,6,red
7.4,0.7,0,1.9,76,11,34,9.978,3.51,0.56,9.4,5,red
7.4,0.66,0,1.8,75,13,40,9.978,3.51,0.56,9.4,5,red
7.9,0.6,0.06,1.6,69,15,59,9.964,3.3,0.46,9.4,5,red
8,0.27,0.25,19.1,45,50,208,100.051,03.05,0.5,9.2,6,white
6.3,0.38,0.17,8.8,0.08,50,212,99.803,3.47,0.66,9.4,4,white
7.1,0.21,0.28,2.7,34,23,111,99.405,3.35,0.64,10.2,4,white
6.2,0.38,0.18,7.4,95,28,195,99.773,3.53,0.71,9.2,4,white
8.2,0.24,0.3,2.3,0.05,23,106,99.397,2.98,0.5,10,5,white
7,0.16,0.26,6.85,47,30,220,99.622,3.38,0.58,10.1,6,white
7.3,815,0.09,11.4,44,45,204,99.713,3.15,0.46,9,5,white
6.3,0.41,0.16,0.9,32,25,98,99.274,3.16,0.42,9.5,5,white
And the error is as follows:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-942872a3fef1> in <module>()
11
12 # load dataset
---> 13 dataset = np.loadtxt('/content/wine.csv', delimiter=",")
14
15 # split dataset into sets for testing and training
/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in floatconv(x)
792 if '0x' in x:
793 return float.fromhex(x)
--> 794 return float(x)
795
796 typ = dtype.type
ValueError: could not convert string to float: 'red'
Help Me Please, Thank you!
2
Author: Vinicius de Aguiar Benvinda, 2020-06-28
1 answers
Hi, the problem is that your database (wine.csv) has in its lines both numbers and strings (labels); one way to be able to read this data would be using the pandas and LabelEncoder (to convert your red red and white categorias categories into one-hot) of scikit-learn. Another thing I've noticed is that you might be confusing your predictors with the target, the changes I've made to the code are considering that you want the values numerical as forecasters and 'red' and 'white' as target to be classified. I hope I helped,
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
LE = LabelEncoder()
np.random.seed(2)
# number of wine classes
classifications = 2
# load dataset
dataset = pd.read_csv('/content/wine.csv', header=None)
dataset[12] = LE.fit_transform(dataset[12])
X = dataset.iloc[:,1:12].values
Y = dataset.iloc[:,12]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.66,
random_state=5)
# convert output values to one-hot
y_train = keras.utils.to_categorical(y_train-1, classifications)
y_test = keras.utils.to_categorical(y_test-1, classifications)
# creating model
model = Sequential()
model.add(Dense(10, input_dim=11, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(classifications, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=
['accuracy'])
model.fit(x_train, y_train, batch_size=15, epochs=2500, validation_data=(x_test, y_test))
1
Author: Ricardo Tenorio, 2020-07-01 16:56:27