lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damian Gajda <dga...@caltha.pl>
Subject Re: HTML parser??
Date Tue, 03 May 2005 15:30:36 GMT
Hello,

> This documents are in german. In this documents are different special
> characters, and different ways of writing this special characters, like "รถ",
> "&ouml;" and "&#246". Do somebody know a parsing engine that has no problems
> with all this different ways to write this special characters?

I've created a component for parsing HTML entities (special characters).
This component is a part of ObjectLedge project - it is stored in
components subproject. Please feel free to use this component. It is
licensed under BSD (Apache like) license. You will need to check the
ledge-components CVS module.

http://objectledge.org/

You are also welcome to use ObjectLedge as a whole :)

Regards,
-- 
Damian Gajda



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message