How to remove accents and other graphical signs from a string in Java?

How to remove accents and other graphical signals from a string in Java? Ex.:

String s = "maçã";
String semAcento = ???; // resultado: "maca"
Author: UzumakiArtanis, 2013-12-11

2 answers

I usually use regex along with the Class Normalizer. Like this:

public static String removerAcentos(String str) {
    return Normalizer.normalize(str, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");
}
 74
Author: Rodrigo Sasaki, 2014-01-18 10:57:12

If it is Java7+ you can use this solution found in Soen https://stackoverflow.com/a/1215117/1518921

First import this:

import java.text.Normalizer;
import java.util.regex.Pattern;

Then add this to your parent class or the other class you use:

public static String deAccent(String str) {
    String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD); 
    Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
    return pattern.matcher(nfdNormalizedString).replaceAll("");
}

And at the time of use it would look something like:

System.out.print(deAccent("Olá, mundo!"));

It makes use of the regular expression (regex) to exchange them: \p{InCombiningDiacriticalMarks}+

See working on IDEONE: https://ideone.com/MtgLAC

 7
Author: Guilherme Nascimento, 2017-12-06 18:27:05