Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 50644 invoked from network); 18 Jan 2002 19:20:36 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 18 Jan 2002 19:20:36 -0000 Received: (qmail 17783 invoked by uid 97); 18 Jan 2002 19:20:39 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 17767 invoked by uid 97); 18 Jan 2002 19:20:39 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 17756 invoked from network); 18 Jan 2002 19:20:38 -0000 Subject: Re: [Help with upcomming submission/RFC] Excel Parser From: "Andrew C. Oliver" To: Lucene Developers List In-Reply-To: <3C486EBC.3080607@earthlink.net> References: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7E4F@mail.grandcentral.com> <3C486EBC.3080607@earthlink.net> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/1.0.0.99+cvs.2001.12.18.08.57 (Preview Release) Date: 18 Jan 2002 14:12:34 -0500 Message-Id: <1011381154.5120.30.camel@linux2.superlinksoftware.com> Mime-Version: 1.0 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Fri, 2002-01-18 at 13:51, Dmitry Serebrennikov wrote: > Doug Cutting wrote: > > > > >In my opinion, if the code will be in Lucene's jar file, then any required > >libraries should be in Lucene's lib directory and should be packaged in any > >Lucene distribution. Of course, folks who don't use these classes won't > >need the POI libs... Do folks agree with this policy? Or do people feel > >strongly that anything with external dependencies should be packaged > >separately from the Lucene jar? > > > I think it's important to keep Lucene core separate, since not all > applications will need the POI capability. Ideally, I think, Lucene's I have no strong feelings about this one way or another. I'd suggest it be easy to setup and configure with the POI functionality. I'm personally (currently) unable to use Lucene at most places I contract because I need Office doc functionality. As long as its easy to install for a novice that's good with me. > core will have a set of interfaces or base classes for document parsers, > while specific parsers will implement these interfaces. The mapping can > be done with a configuration file that specifies a MIME type and a class > name for the corresponding parser. +1 Or this can be done the way JDBC > drivers are done, and have the parser classes self-register when they > are referenced. The classes will then be loaded dynamically (at startup > to avoid performance hit during indexing), and used based on the > document type. This way the parsers can be supplied in a separate jar > file that will have a dependency on POI (or whatever). This also allows > use of proprietary parsers or easy integration with other open source > technology that may be out there. Lucene wouldn't have to ship POI and > other libraries but it would provide a package, as an optional download, > that integrates POI parsers into Lucene. Same can be done with PDF > parsers for example. > If this approach is taken, I'd suggest perhaps an "optional filters" jar distribution that can be downloaded and installed into lucene with little trouble. Personally, as a likely user, my preference is for common basic documents: HTML, PDF, Office, RTF to all be included. Though I have no strong feelings. > Also, I'm not really an expert on open source licenses, but it seems > that having this framework would allow the use of parsers with various > licenses because the parser adapters can be licensed separately from the > main Lucene code. Or is this not needed since Lucene is under Apache > license which allows greater flexibility? > POI is also APL. -Andy > Dmitry. > > > > > -- > To unsubscribe, e-mail: > For additional commands, e-mail: > -- www.superlinksoftware.com www.sourceforge.net/projects/poi - port of Excel format to java http://developer.java.sun.com/developer/bugParade/bugs/4487555.html - fix java generics! The avalanche has already started. It is too late for the pebbles to vote. -Ambassador Kosh -- To unsubscribe, e-mail: For additional commands, e-mail: