commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "E. Michael Akerman" <m...@exchange.uark.edu>
Subject Re: [lang] StringEscapeUtils.unescapeHtml(" ") doesn't return a space
Date Thu, 26 Aug 2010 15:14:01 GMT
I'm not certain how StringEscapeUtils handles it, but in HTML land, it should be equal to character
160 instead of 32.  It has 
different meaning than space.

Michael Akerman
Systems Analyst
University IT Services

----- Original Message ----- 
From: "Vitor Costa" <fvitorc@yahoo.com.br>
To: <user@commons.apache.org>
Sent: Wednesday, August 25, 2010 4:50 PM
Subject: [lang] StringEscapeUtils.unescapeHtml(" ") doesn't return a space


Hi,

I am writing a crawler to get some info on web pages and I am using commons lang
to unescape the html file.
I was having some problems with my regex expressions until I realized that the
following is printing false:

System.out.println(" ".equals(StringEscapeUtils. unescapeHtml("&nbsp;")));

Is this a bug? Or is it the expected behavior of the unescape method when
dealing with escaped space characters?


Also, if I unescape 'sbrubbles&nbps;' and then trim() it, the space still
appears in the end of the string.
Visually  speaking, unescaping '&nbsp;' returns a space. But programmatically
speaking, the system doesn't recognize it as a space character.

Thanks in advance,
Vitor.





---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message