string encoding in Java

I teach Java. When writing the code, it was necessary to recode the resulting string. After I read a few posts and articles about this , I realized that in Java strings do not have the concept of encoding, and if you want to change it, you need to make a byte array from the string, change the encoding in the array, and return the resulting value back to the string. It turned out something like this:

String data = "Текст который получаю";
data = new String (arr.getBytes("utf8"), "utf8");

But after I heard a lot of criticism that this is kind of wrong - and here's what questions I would like to hear the answer:

  1. Did I get it right?
  2. Am I implementing this correctly in the code?
Author: Fariz Mamedow, 2018-01-26

1 answers

Encoding is just a table where each letter corresponds to a number (remember that the computer stores only numbers?). Therefore, the encoding only makes sense when an array of numbers (byte [], for example) we need to turn it into a string and need to understand which letter to associate each number with (well, or in the opposite direction).

So the encoding only makes sense when converted to an array of bytes, or vice versa. Your socket entry is it.

Roughly speaking you read an array of bytes from the socket:

byte[] data = socket.read();

Next, we need to interpret these numbers as a string, with the correct table (encoding). In this case the encoding shows which letter each number will be converted to:

new String(data, "UTF-8");

And vice versa, we have a string, and we want to convert it to an array of bytes, we specify the encoding, that is, what number each letter will be converted to:

String data = "abc";
byte[] array = data.getBytes("UTF-8");
socket.write(array);

If you have problems with Russian letters, then most likely you have specified them incorrect encoding. Which one you need to specify - you need to parse what actually comes to you from the socket.

 4
Author: Uraty, 2018-01-26 09:30:38