lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew C. Oliver" <>
Subject Re: [Help with upcomming submission/RFC] Excel Parser
Date Fri, 18 Jan 2002 19:12:34 GMT
On Fri, 2002-01-18 at 13:51, Dmitry Serebrennikov wrote:
> Doug Cutting wrote:
> >
> >In my opinion, if the code will be in Lucene's jar file, then any required
> >libraries should be in Lucene's lib directory and should be packaged in any
> >Lucene distribution.  Of course, folks who don't use these classes won't
> >need the POI libs...  Do folks agree with this policy?  Or do people feel
> >strongly that anything with external dependencies should be packaged
> >separately from the Lucene jar?
> >
> I think it's important to keep Lucene core separate, since not all 
> applications will need the POI capability. Ideally, I think, Lucene's 

I have no strong feelings about this one way or another.  I'd suggest it
be easy to setup and configure with the POI functionality.  I'm
personally (currently) unable to use Lucene at most places I contract
because I need Office doc functionality.  As long as its easy to install
for a novice that's good with me.

> core will have a set of interfaces or base classes for document parsers, 
> while specific parsers will implement these interfaces. The mapping can 
> be done with a configuration file that specifies a MIME type and a class 
> name for the corresponding parser. 


Or this can be done the way JDBC 
> drivers are done, and have the parser classes self-register when they 
> are referenced. The classes will then be loaded dynamically (at startup 
> to avoid performance hit during indexing), and used based on the 
> document type. This way the parsers can be supplied in a separate jar 
> file that will have a dependency on POI (or whatever). This also allows 
> use of proprietary parsers or easy integration with other open source 
> technology that may be out there. Lucene wouldn't have to ship POI and 
> other libraries but it would provide a package, as an optional download, 
> that integrates POI parsers into Lucene. Same can be done with PDF 
> parsers for example.

If this approach is taken, I'd suggest perhaps an "optional filters" jar
distribution that can be downloaded and installed into lucene with
little trouble.  

Personally, as a likely user, my preference is for common basic
documents:  HTML, PDF, Office, RTF to all be included.  Though I have no
strong feelings.

> Also, I'm not really an expert on open source licenses, but it seems 
> that having this framework would allow the use of parsers with various 
> licenses because the parser adapters can be licensed separately from the 
> main Lucene code. Or is this not needed since Lucene is under Apache 
> license which allows greater flexibility?

POI is also APL.  


> Dmitry.
> --
> To unsubscribe, e-mail:   <>
> For additional commands, e-mail: <>
-- - port of Excel format to java 
			- fix java generics!

The avalanche has already started. It is too late for the pebbles to
-Ambassador Kosh

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message