lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <>
Subject Re: [Help with upcomming submission/RFC] Excel Parser
Date Fri, 18 Jan 2002 18:51:40 GMT
Doug Cutting wrote:

>In my opinion, if the code will be in Lucene's jar file, then any required
>libraries should be in Lucene's lib directory and should be packaged in any
>Lucene distribution.  Of course, folks who don't use these classes won't
>need the POI libs...  Do folks agree with this policy?  Or do people feel
>strongly that anything with external dependencies should be packaged
>separately from the Lucene jar?
I think it's important to keep Lucene core separate, since not all 
applications will need the POI capability. Ideally, I think, Lucene's 
core will have a set of interfaces or base classes for document parsers, 
while specific parsers will implement these interfaces. The mapping can 
be done with a configuration file that specifies a MIME type and a class 
name for the corresponding parser. Or this can be done the way JDBC 
drivers are done, and have the parser classes self-register when they 
are referenced. The classes will then be loaded dynamically (at startup 
to avoid performance hit during indexing), and used based on the 
document type. This way the parsers can be supplied in a separate jar 
file that will have a dependency on POI (or whatever). This also allows 
use of proprietary parsers or easy integration with other open source 
technology that may be out there. Lucene wouldn't have to ship POI and 
other libraries but it would provide a package, as an optional download, 
that integrates POI parsers into Lucene. Same can be done with PDF 
parsers for example.

Also, I'm not really an expert on open source licenses, but it seems 
that having this framework would allow the use of parsers with various 
licenses because the parser adapters can be licensed separately from the 
main Lucene code. Or is this not needed since Lucene is under Apache 
license which allows greater flexibility?


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message