lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bartosch Warzecha <b.warze...@babiel.com>
Subject HTML parser??
Date Tue, 03 May 2005 08:35:38 GMT

Hello,

I´m building a search engine for HTML-Dokuments, and I´ve got a HTML-parsing
problem.

This documents are in german. In this documents are different special
characters, and different ways of writing this special characters, like "ö",
"&ouml;" and "&#246". Do somebody know a parsing engine that has no problems
with all this different ways to write this special characters?

Thanks

b.warzecha

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message