Task "Number of words in the text" (Python) - the code deals with " postscript"

You need to determine how many different words are contained in the text. In this case, a word is considered to be a sequence of consecutive characters (with the exception of spaces), words are separated by one or more spaces or end-of-line characters. For example, "Share" and "Share," are 2 different words. My option:

inFile = open('input.txt', 'r', encoding='utf8')
a = str(inFile.readlines())
print(len(set(a.split())))

Test text:

She sells sea shells on the sea shore;
The shells that she sells are sea shells I'm sure.
So if she sells sea shells on the sea shore,
I'm sure that the shells are sea shore shells.

It should be 19 words, but for some reason I have 20 ... where is the error? Thanks!

Author: Lazarevna, 2018-08-13

1 answers

The best option, but a little shorter:

In [250]: len(set(open(r'C:\Temp\a.txt').read().split()))
Out[250]: 19

Even better, use pathlib:

from pathlib import Path

In [251]: len(set(Path(r'C:\Temp\a.txt').read_text(encoding='utf-8').split()))
Out[251]: 19
 3
Author: MaxU, 2018-08-13 15:19:44