ValueError: could not convert string to float: 'red'

Question

ValueError: could not convert string to float: 'red'

Hello, I'm trying to make a model for white and red wine decision, this is my code:

from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense 
import numpy as np

np.random.seed(2)

# number of wine classes
classifications = 2

# load dataset
dataset = np.loadtxt('/content/wine.csv', delimiter=",")

# split dataset into sets for testing and training
X = dataset[:,1:12]
Y = dataset[:,0:1]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.66, random_state=5)

# convert output values to one-hot
y_train = keras.utils.to_categorical(y_train-1, classifications)
y_test = keras.utils.to_categorical(y_test-1, classifications)


# creating model
model = Sequential()
model.add(Dense(10, input_dim=13, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(classifications, activation='softmax'))

# compile and fit model
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=15, epochs=2500, validation_data=(x_test, y_test))

My csv code is kind of big so I'll put only a few columns:

7.4,0.7,0,1.9,76,11,34,9.978,3.51,0.56,9.4,5,red
7.8,0.88,0,2.6,98,25,67,9.968,3.2,0.68,9.8,5,red
7.8,0.76,0.04,2.3,92,15,54,997,3.26,0.65,9.8,5,red
11.2,0.28,0.56,1.9,75,17,60,998,3.16,0.58,9.8,6,red
7.4,0.7,0,1.9,76,11,34,9.978,3.51,0.56,9.4,5,red
7.4,0.66,0,1.8,75,13,40,9.978,3.51,0.56,9.4,5,red
7.9,0.6,0.06,1.6,69,15,59,9.964,3.3,0.46,9.4,5,red
8,0.27,0.25,19.1,45,50,208,100.051,03.05,0.5,9.2,6,white
6.3,0.38,0.17,8.8,0.08,50,212,99.803,3.47,0.66,9.4,4,white
7.1,0.21,0.28,2.7,34,23,111,99.405,3.35,0.64,10.2,4,white
6.2,0.38,0.18,7.4,95,28,195,99.773,3.53,0.71,9.2,4,white
8.2,0.24,0.3,2.3,0.05,23,106,99.397,2.98,0.5,10,5,white
7,0.16,0.26,6.85,47,30,220,99.622,3.38,0.58,10.1,6,white
7.3,815,0.09,11.4,44,45,204,99.713,3.15,0.46,9,5,white
6.3,0.41,0.16,0.9,32,25,98,99.274,3.16,0.42,9.5,5,white

And the error is as follows:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-942872a3fef1> in <module>()
     11 
     12 # load dataset
---> 13 dataset = np.loadtxt('/content/wine.csv', delimiter=",")
     14 
     15 # split dataset into sets for testing and training

/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in floatconv(x)
    792         if '0x' in x:
    793             return float.fromhex(x)
--> 794         return float(x)
    795 
    796     typ = dtype.type

ValueError: could not convert string to float: 'red'

Help Me Please, Thank you!

2

python csv machine-learning

Author: Vinicius de Aguiar Benvinda, 2020-06-28

Source

1 answers

score 1 · Answer 1

Hi, the problem is that your database (wine.csv) has in its lines both numbers and strings (labels); one way to be able to read this data would be using the pandas and LabelEncoder (to convert your red red and white categorias categories into one-hot) of scikit-learn. Another thing I've noticed is that you might be confusing your predictors with the target, the changes I've made to the code are considering that you want the values numerical as forecasters and 'red' and 'white' as target to be classified. I hope I helped,

from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense 
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder

LE = LabelEncoder()

np.random.seed(2)

# number of wine classes
classifications = 2

# load dataset
dataset = pd.read_csv('/content/wine.csv', header=None)

dataset[12] = LE.fit_transform(dataset[12])

X = dataset.iloc[:,1:12].values
Y = dataset.iloc[:,12]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.66, 
                                                random_state=5)

# convert output values to one-hot
y_train = keras.utils.to_categorical(y_train-1, classifications)
y_test = keras.utils.to_categorical(y_test-1, classifications)

# creating model
model = Sequential()
model.add(Dense(10, input_dim=11, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(classifications, activation='softmax'))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics= 
['accuracy'])
model.fit(x_train, y_train, batch_size=15, epochs=2500, validation_data=(x_test, y_test))