Encoding of Russian letters in HTML received via URL
After working out the code, the received data is written to the database, but in the form of ���������. How do I specify the encoding?
new Thread(new Runnable() {
@Override
public void run() {
try {
URL url = new URL("http://pomni.info/pomni/home/view/kaloriinosti_productov.html");
URLConnection conn = url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream(),"utf-8"));
String inputLine;
ContentValues contentValues = new ContentValues();
Pattern pattern1 = Pattern.compile("<td>(\\D+)<\\/td>");
while ((inputLine = br.readLine()) != null) {
Matcher m = pattern1.matcher(inputLine);
if (m.find()) {
System.out.println(m.group(1));
contentValues.put(SQLHelper.KEY_PRODUCT, m.group(1));
sqlHelper.insertProduct(contentValues);
}
}
br.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}).start();
0
1 answers
There is nothing in the headers from the server, but according to the tags of the page itself, it is windows-1251
<META http-equiv="content-type" content="text/html; charset=windows-1251">
Accordingly, if you try to read it as UTF-8
, there will be only rhombuses :)
1
Author: Eugene Krivenja, 2017-10-01 20:58:14