lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua O'Madadhain <>
Subject Re: Parsers
Date Sat, 24 Aug 2002 05:16:48 GMT
On Sat, 24 Aug 2002, Pradeep Kumar K wrote:

> Hi friends
> I need parsers for the following file formats
> 1. HTML
> 2. PDF
> 3. MSWord
> 4. RTF
> 4. Simple text
> Do any body developed parsers( in java) for all/any of the file formats? 
> If you have please tell me the links so that I can download.

A simple HTML parser is part of the download package (one of the
examples).  Check the contrib section on the Lucene web page; I believe a
couple of different PDF parsers are there, and perhaps others.

Not sure what you mean by a "simple text" parser.  Do you mean something
more complicated than what you can do with StringTokenizer?

Joshua O'Madadhain Per
  Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message