lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damian Gajda <>
Subject Re: HTML parser??
Date Tue, 03 May 2005 15:30:36 GMT

> This documents are in german. In this documents are different special
> characters, and different ways of writing this special characters, like "รถ",
> "&ouml;" and "&#246". Do somebody know a parsing engine that has no problems
> with all this different ways to write this special characters?

I've created a component for parsing HTML entities (special characters).
This component is a part of ObjectLedge project - it is stored in
components subproject. Please feel free to use this component. It is
licensed under BSD (Apache like) license. You will need to check the
ledge-components CVS module.

You are also welcome to use ObjectLedge as a whole :)

Damian Gajda

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message