lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tod Thomas <>
Subject Re: Parser Question
Date Wed, 16 Jul 2003 12:00:39 GMT
Peter Becker wrote:

> Leo Galambos wrote:
> > Peter Becker wrote:
> >
> >> Hi Tod,
> >>
> >> as far as I know Lucene itself doesn't offer this (at least we failed
> >> to find it). The closest thing available seem to be the Ant tasks.
> >>
> >> We are currently working on introducing this notion for our program,
> >> which is open source. Beside the plugin mechanism there will be a
> >> file filter mapping and a thread mechanism to maintain an index as
> >> well as implementations using POI and Multivalent. Give us another
> >> week or two.
> >
> >
> > Unfortunately, I didn't get this. Could you explain the mechanism,
> > please? Thank you
> Not fully yet, since we are still working on it ;-) You can find the
> code in our CVS repository on Sourceforge:
> The idea is that you have to supply different parsers for different
> formats, then turn the results found into Lucene Document objects. At
> the moment we do this using a normal interface similar to the one used
> in the Java Ant tasks (see the "handlers" directory), but we want to
> turn it into a plugin interface. Our tool should in the end do TXT, HTML
> and XML out of the box and have at least three plugin implementations:
>   - POI for .doc, .xls
>   - PDFbox for .pdf
>   - Multivalent for .pdf, .dvi and others
> The plugin API will be extremely simple and it should fit easily with
> the Ant tasks, so you should be able to wrap our code into an Ant task
> or whatever interface you need.

This sounds really cool.  If I'm reading you correctly it will be a fairly intuitive
exercise to port parsers writtent in Java for existing file formats to use your plugin
architecture.  Accurate?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message