UTF-8 and JTextArea
I have 2 JTextArea, one of them contains unicode like this \ u0645. I want a different JTextArea
show the Unicode symbolic representation of this code. If you pass this code point to the JTextArea it will show
code point not Character, but if I find code in the setText method of JTextArea it works correctly!
why? and can I pass String of Codepoint from one JTextArea to another?
thanks
a source to share
This code displays a character, and the corresponding "unicode string" matches in another text area:
import java.awt.*;
import javax.swing.*;
public class FrameTest {
public static void main(String[] args) {
JFrame jf = new JFrame("Demo");
Container cp = jf.getContentPane();
cp.setLayout(new BorderLayout());
JTextArea ta1 = new JTextArea(20, 20);
JTextArea ta2 = new JTextArea(20, 20);
Character c = '\u0645';
ta1.setText("" + c);
String s = String.format("\\%04x", (int) c.charValue());
ta2.setText(s);
cp.add(ta1, BorderLayout.WEST);
cp.add(ta2, BorderLayout.EAST);
jf.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
jf.setSize(500, 100);
jf.setVisible(true);
}
}

So, if you have long text of such characters, you will need to loop through the character of the string by character using getCharAt(int)
or getChars()
) and process each character with String.format("\\%04x", (int) c.charValue());
and add the result to the target string. (Use preferred StringBuffer
.)
a source to share
If I set the code pointing to the setText method of the JTextArea it works correctly!
If by that you mean something like myTextArea.setText('\u0645')
, then the problem is pretty clear:
Java compiler interprets unicode escape\u
as start . This means that in Java source code, all characters are completely equivalent to actually putting the character (Unicode character U + 0645 ARABIC LETTER MEEM ) in the same place.\u0645
م
So the next two lines do the same:
myTextArea.setText('\u0645')
myTextArea.setText('م')
The reason for this is that it is \u0645
converted to the corresponding Unicode character at compile time .
This is a completely different situation if you have a Java string that contains 6 characters \u0645
. This string can be represented as a Java string literal "\\u0645"
(note the double backslash to avoid the compiler interpreting the Unicode escape code).
In this case, you can grab from the third to the last character ( "\\u0645".subString(2)
), parse it as a hexadecimal number (s Integer.parseInt(theString, 16)
), and pipe the result to char
. Then you will get a value char
containing the actual Unicode character.
a source to share