How to convert a binary file to ASCII on UNIX?

I am looking for a certain string in a fairly large file:

$ ls -lh archivo.csv
-rw-rw-rw- 1 yo yo 723M Dec 10 10:46 archivo.csv

If I use grep I find that the result does not appear, but only the indication that there is any in the file:

$ grep "12345" archivo.csv
Binary file archivo.csv matches

So looking at the type of file it is I notice that it is...

$ file archivo.csv
archivo.csv: ISO-8859 text, with very long lines, with CRLF line terminators

I have converted it to UNIX with the command dos2unix:

$ dos2unix archivo.csv
dos2unix: converting file archivo.csv to Unix format...

But the problem keeps showing up:

$ grep "12345" archivo.csv
Binary file archivo.csv matches

I've noticed after grep has an option to search in binary files, the -a:

$ grep -a "12345" archivo.csv
12345  esto es un test

Then man grep indicates that:

-a, --text
    Process a binary file as if it were text; 
    this is equivalent to the --binary-files=text option.

But so and all I wonder, how can I convert this binary file to ASCII?

 8
Author: fedorqui 'SO deja de dañar', 2015-12-10

2 answers

Actually, all files are binary (obviously), but when said binary encoding is given an X interpretation, then we say that it has the X encoding (or are encoded in X).

In your case, the file is not binary, it has the encoding ISO-8859 and therefore you must use tools that know how to work (understand) said encoding.

The -a parameter of grep forces it to ignore certain codes that are not interpreted as an ASCII text string (e.g. \x0).

So, in your case, you should convert the file to another more suitable to your tools, for which, of course, there are many tools but to me, the one I like most is iconv which in your case would be something like (same ref)

$ iconv -f ISO-8859-15 -t UTF-8 foo >foo.utf

(note: instead of a utf you could pass it to ASCII as you ask, but then you might lose existing information in the original file such as the symbol §).

For example, taking this file we have

$ file samples7.var
samples7.var: HTML document, ISO-8859 text
$ grep Deut samples7.var
Binary file samples7.var matches
$ grep -a Deut samples7.var
<TITLE>German / Deutsch S▒d (ISO Latin-1 / ISO 8859-1)</TITLE>
<H1>German / Deutsch S▒d (ISO Latin-1 / ISO 8859-1)</H1>
$ iconv -f ISO-8859-15 -t UTF-8 samples7.var > samples7.var.utf
$ file samples7.var.utf
samples7.var.utf: HTML document, UTF-8 Unicode text
$ grep Deut samples7.var.utf
<TITLE>German / Deutsch Süd (ISO Latin-1 / ISO 8859-1)</TITLE>
<H1>German / Deutsch Süd (ISO Latin-1 / ISO 8859-1)</H1>

Which, as we see, allows you to visualize and filter correctly without losing information.

And finally, using dos2unix does not help in this case, because the command requires the files to be plain text, and your file does not have that encoding (see dos2unix).

 10
Author: josejuan, 2015-12-10 10:40:46

Is somewhat broad what you ask, as for me it would depend on what the file contains and what I want to extract from it. If it serves you the way grep is converting the binary file to text and you're not interested in more file information than it gives you, you can just use:

grep -a '*' archivo.csv > archivo_texto.csv
 4
Author: ArthurChamz, 2015-12-10 11:58:25