lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua O'Madadhain <jmad...@ics.uci.edu>
Subject Re: Parsers
Date Sat, 24 Aug 2002 05:16:48 GMT
On Sat, 24 Aug 2002, Pradeep Kumar K wrote:

> Hi friends
> 
> I need parsers for the following file formats
> 1. HTML
> 2. PDF
> 3. MSWord
> 4. RTF
> 4. Simple text
> 
> Do any body developed parsers( in java) for all/any of the file formats? 
> If you have please tell me the links so that I can download.

A simple HTML parser is part of the download package (one of the
examples).  Check the contrib section on the Lucene web page; I believe a
couple of different PDF parsers are there, and perhaps others.

Not sure what you mean by a "simple text" parser.  Do you mean something
more complicated than what you can do with StringTokenizer?

Joshua O'Madadhain

 jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
  Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.




--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message