lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Gunn <kg...@csd.abdn.ac.uk>
Subject Re: Parsers
Date Sat, 24 Aug 2002 14:54:57 GMT
Although no-one else seems to have come across any problems the HTML
parser that came with lucene did not operate efficiently enough for me so
I found an alterative from http://htmlparser.sourceforge.net
the Java API is all you need to parse text files.
for PDF several APIs are available, I recommend www.pdfbox.org
i had no luck in finding API's for msword or rtf. but there are plenty
tools that can do the job.




On Sat, 24 Aug 2002, Pradeep Kumar K wrote:

> Hi friends
>
> I need parsers for the following file formats
> 1. HTML
> 2. PDF
> 3. MSWord
> 4. RTF
> 4. Simple text
>
> Do any body developed parsers( in java) for all/any of the file formats?
> If you have please tell me the links so that I can download.
>
> Thanks in Advance
> Pradeep
>
>
> --------------------------------------------------------------
> Robosoft Technologies - Partners in Product Development
>
>
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message