Skip to main content

Java and “\u” ( blackslash u )

This article is related to escaping of “\u” backslash u [Unicode character] in Java

Problem Statement:

I have a string which consists of a DOS path something like "\sample\user_data\example".The “\u” (backslash u) in “\user_data” above gives “an invalid Unicode” JavaScript error in IE and hence my page isn’t displayed. I tried to replace “\u” (backslash u) in the string with something like "\ u"(backslash u) as I was not able to escape it. This also does not work . Java complier does not allow “\u” (backslash u) character and gives "Invalid unicode character sequence" error when I use it with replaceAll.

Que 1: Can I escape the \u character some how in Java or JavaScript?
Que 2: How can I replace all "\u" character in a string with something
else like "\ u"?

I have posted this query to some of the Java and JavaScript related groups. Here are the solutions I found

Before going into the details about solution let us understand unicode in brief

What is Unicode Character?

The char type represents 16-bit Unicode characters
Unicode is a superset of the ASCII character set which allows non-English language characters
Any Unicode character can be written as a literal using the Escape character (backslash \) and it's hexadecimal representation
'\udddd' // where 'dddd' = hex digit (0 - F)
Single characters are represented within single quotes
'a' // char literal
'9' // char literal
There are three exceptions that require the use of the Escape character
Single quote ' \' ' displays as '
Double quote ' \" ' displays as "
Backslash ' \\ ' displays as \

There are certain special characters which can be represented by escape sequences

Esc Char

Unicode Char

Definition

\n

\u000A

newline

\t

\u0009

tab

\b

\u0008

backspace

\r

\u000D

return

\f

\u000C

form feed

\ddd


octal value

Octal character constants can have three digits or less (\000 through \377)

Now onto the Solution approaches:

One obvious solution would be :

The \ symbol is used for escaping special characters. If you want a
path like you said, you should escape the slashes themselves, like
this:
\\sample\\user_data\\example

This will not work

Reason:

The compiler translates Unicode characters at the starting of the whole compiling process.
When \u (backslash u) encountered by the compiler, it assumes that it is a Unicode character, since Unicode literal stats with \u (backslash u), and expect some hexadecimal number followed by \u.

If you are looking for solution in Java it is straight forward,

You can declare that in StringBuffer and by using StringTokenizer , you can
escape \u (backslash u)

Sample code:

s = "your string with \u";
StringBuffer sbuffer=new StringBuffer();
for(int i=0;i<s.length();i++){
char ch= s.charAt(i);

if(ch>='\u0000' && ch<='\u001F'){
String ss=Integer.toHexString(ch);
sbuffer.append("\\u");
for(int k=0;k<4-ss.length();k++){
sbuffer.append('0');
}
sbuffer.append(ss.toUpperCase());

}
else{
sbuffer.append(ch);

}
}

But if you are looking for solution in Java Script, I am also interested to hear about it.


Another related important point :

Using the Unicode escape characters \u000A for newline and \u000D for return in a String or comment produces a compile-error as they are interpreted, literally, as 'end-of-line'.
Always use the special characters '\n' or '\r'

Refer:

Unicode

Lexical Structure : Sun

Unicode and Character Escapes

Comments

Popular posts from this blog

Every thing about ConcurrentHashMap

Why ConcurrentHashMap is better than Hashtable and just as good as a HashMap http://www.codercorp.com/blog/java/why-concurrenthashmap-is-better-than-hashtable-and-just-as-good-hashmap.html Why ConcurrentHashMap does not support null values http://anshuiitk.blogspot.com/2010/12/why-concurrenthashmap-does-not-support.html