lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jay ! <cyberja...@yahoo.com>
Subject RE: Lucene has moved to Jakarta
Date Sat, 06 Oct 2001 11:15:21 GMT
How does i2 do it? http://www.i2a.com/websearch/  -
they list both an HTML parser and a PDF parser as part
of their solution. 

J

--- Doug Cutting <DCutting@grandcentral.com> wrote:
> > From: William Wong [mailto:keng.wong@verizon.net]
> > 
> > How about adding filters for different file types
> such as
> > -HTML (there is one in the demo already)
> > -XML
> > -PDF
> > -MsWord/RTF
> > -other common file formats
> 
> These would be great.  Who will implement them?
> I was only listing tasks that I plan to do.
> 
> I think the best API for such converters is a method
> that takes a
> java.io.InputStream and returns a java.io.Reader
> containing plain text,
> e.g.:
>      public static java.io.InputStream
> getText(java.io.Reader);
> That way they can easily be used by Lucene
> analyzers.
> 
> Should we put converters in
> org.apache.lucene.document?
> 
> Contributions anyone?
> 
> Doug


__________________________________________________
Do You Yahoo!?
NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

Mime
View raw message