Problem Statement:
I have a string which consists of a DOS path something like "\sample\user_data\example".The “\u” (backslash u) in “\user_data” above gives “an invalid Unicode” JavaScript error in IE and hence my page isn’t displayed. I tried to replace “\u” (backslash u) in the string with something like "\ u"(backslash u) as I was not able to escape it. This also does not work . Java complier does not allow “\u” (backslash u) character and gives "Invalid unicode character sequence" error when I use it with replaceAll.
Que 1: Can I escape the \u character some how in Java or JavaScript?
Que 2: How can I replace all "\u" character in a string with something
else like "\ u"?
I have posted this query to some of the Java and JavaScript related groups. Here are the solutions I found
Before going into the details about solution let us understand unicode in brief
What is Unicode Character?
The char type represents 16-bit Unicode characters
Unicode is a superset of the ASCII character set which allows non-English language characters
Any Unicode character can be written as a literal using the Escape character (backslash \) and it's hexadecimal representation
'\udddd' // where 'dddd' = hex digit (0 - F)
Single characters are represented within single quotes
'a' // char literal
'9' // char literal
There are three exceptions that require the use of the Escape character
Single quote ' \' ' displays as '
Double quote ' \" ' displays as "
Backslash ' \\ ' displays as \
There are certain special characters which can be represented by escape sequences
Esc Char
Unicode Char
Definition
\n
\u000A
newline
\t
\u0009
tab
\b
\u0008
backspace
\r
\u000D
return
\f
\u000C
form feed
\ddd
octal value
Octal character constants can have three digits or less (\000 through \377)
Now onto the Solution approaches:
One obvious solution would be :
The \ symbol is used for escaping special characters. If you want a
path like you said, you should escape the slashes themselves, like
this:
\\sample\\user_data\\example
This will not work
Reason:
The compiler translates Unicode characters at the starting of the whole compiling process.
When \u (backslash u) encountered by the compiler, it assumes that it is a Unicode character, since Unicode literal stats with \u (backslash u), and expect some hexadecimal number followed by \u.
If you are looking for solution in Java it is straight forward,
You can declare that in StringBuffer and by using StringTokenizer , you can
escape \u (backslash u)
Sample code:
s = "your string with \u";StringBuffer sbuffer=new StringBuffer();
for(int i=0;i<s.length();i++){
char ch= s.charAt(i);
if(ch>='\u0000' && ch<='\u001F'){
String ss=Integer.toHexString(ch);
sbuffer.append("\\u");
for(int k=0;k<4-ss.length();k++){
sbuffer.append('0');
}
sbuffer.append(ss.toUpperCase());
}
else{
sbuffer.append(ch);
}
}
But if you are looking for solution in Java Script, I am also interested to hear about it.
Another related important point :
Using the Unicode escape characters \u000A for newline and \u000D for return in a String or comment produces a compile-error as they are interpreted, literally, as 'end-of-line'.
Always use the special characters '\n' or '\r'
Refer:
Comments