commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitor Costa <fvit...@yahoo.com.br>
Subject [lang] StringEscapeUtils.unescapeHtml(" ") doesn't return a space
Date Wed, 25 Aug 2010 21:50:45 GMT
Hi,

I am writing a crawler to get some info on web pages and I am using commons lang 
to unescape the html file.
I was having some problems with my regex expressions until I realized that the 
following is printing false:

System.out.println(" ".equals(StringEscapeUtils. unescapeHtml("&nbsp;")));

Is this a bug? Or is it the expected behavior of the unescape method when 
dealing with escaped space characters?


Also, if I unescape 'sbrubbles&nbps;' and then trim() it, the space still 
appears in the end of the string.
Visually  speaking, unescaping '&nbsp;' returns a space. But programmatically 
speaking, the system doesn't recognize it as a space character.

Thanks in advance,
Vitor.


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message