tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Benussi" <mark_benu...@hotmail.com>
Subject [OT Friday] Parse HTML file to underlying text
Date Sat, 03 Sep 2005 08:24:47 GMT
I know I missed the Friday deadline but...

 

Has anyone any recommendations for parsing html. I use Lucene and the
example has its own HTML parser but I was wondering if anyone has used an
existing project or whether there is some built in functionality in an
Apache lib to convert

 

<p>Hello <i>World</i></p>

 

To

 

Hello World

 

Your thoughts are appreciated.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message