lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Strittmatter Stephan (external)" <Stephan.Strittmatter....@kst.siemens.de>
Subject RE: Lucene has moved to Jakarta
Date Wed, 10 Oct 2001 11:13:55 GMT
Hello,

I think PDF-indexing would be the greatest after HTML
for any searcher used in web conditions!

I am also very interested in indexing PDFs! If anybody has
some ideas or supporting libs I would help to implement it!

Greetings, Stephan

> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Friday, October 05, 2001 11:19 PM
> To: 'William Wong'; Lucene-user
> Subject: RE: Lucene has moved to Jakarta
> 
> 
> > From: William Wong [mailto:keng.wong@verizon.net]
> > 
> > How about adding filters for different file types such as
> > -HTML (there is one in the demo already)
> > -XML
> > -PDF
> > -MsWord/RTF
> > -other common file formats
> 
> These would be great.  Who will implement them?
> I was only listing tasks that I plan to do.
> 
> I think the best API for such converters is a method that takes a
> java.io.InputStream and returns a java.io.Reader containing 
> plain text,
> e.g.:
>      public static java.io.InputStream getText(java.io.Reader);
> That way they can easily be used by Lucene analyzers.
> 
> Should we put converters in org.apache.lucene.document?
> 
> Contributions anyone?
> 
> Doug
> 

Mime
View raw message