Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 66638 invoked from network); 8 Feb 2002 13:30:04 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 8 Feb 2002 13:30:04 -0000 Received: (qmail 7556 invoked by uid 97); 8 Feb 2002 13:30:01 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 7525 invoked by uid 97); 8 Feb 2002 13:30:00 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 7514 invoked from network); 8 Feb 2002 13:29:59 -0000 Subject: Re: Proposal for Lucene From: "Andrew C. Oliver" To: Lucene Developers List In-Reply-To: <00c401c1b04c$c7d6a2e0$6401a8c0@darden.virginia.edu> References: <008801c1b03f$d7a00020$0b01a8c0@168.1.8.Domainrelevanz> <00c401c1b04c$c7d6a2e0$6401a8c0@darden.virginia.edu> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/1.0.2 Date: 08 Feb 2002 08:19:18 -0500 Message-Id: <1013174358.20254.46.camel@linux2.superlinksoftware.com> Mime-Version: 1.0 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Is this open source? APL'd? Where can I look at it? On Thu, 2002-02-07 at 22:00, Erik Hatcher wrote: > I've developed something similar myself. I've created an Ant task > that uses DocumentHandler interface implementing classes - one that can be > used () is a FileExtensionDocumentHandler. At build-time > I generate a Lucene index of static documents, and roll that into a web > application. > > Its got some kinks, like how to deal with the documents because they contain > relative hyperlinks... so these documents either should be copied into the > WAR too (or somehow made accessible to the web app) or incorporated directly > into a Lucene field ("rawcontents" is what I'm using now). These issues are > not tough to solve and having some additional parameters to my IndexTask > could allow such things to be customized by the user. > > My task is still evolving, but my plan all along has been to donate it to > lucene-dev for incorporation in some form or another. > > Let me know if you'd like it, and what package name you'd like to use. > > Erik > > > ----- Original Message ----- > From: "Kelvin Tan" > To: "Lucene Developers List" > Sent: Thursday, February 07, 2002 8:27 PM > Subject: Re: Proposal for Lucene > > > Great suggestions all around, and I'm pretty much in agreement with what's > been said. > > In my app, I've built a mini-framework around the searching such that I'm > able to map ContentHandlers (which index file contents) to file extensions. > I've been wanting to clean it up and contribute it for awhile, but haven't > overcome the intertia to do so. Also introduced a DataSource (which can > pretty much be anything, like a filesystem, a database, a URL, etc) from > which to obtain the data to index, so I think it _could_ be inline with what > some of you have in mind. > > I could also use alot of feedback with what's been done too... > > So what's the plan to move forward? > > K > ----- Original Message ----- > From: Mark Tucker > To: Lucene Developers List > Sent: Friday, February 08, 2002 4:03 AM > Subject: RE: Proposal for Lucene > > > I like what you included in your proposal and suggest doing all that (over > time) and taking the following into consideration: > > Indexers/Crawlers > > General Settings > SleeptimeBetweenCalls - can be used to avoid flooding a machine with too > many requests > IndexerTimeout - kill this crawler thread after long period of inactivity > IncludeFilter - include only items matching filter > ExcludeFilter - exclude items matching filter (can be used with > IncludeFilter) > MaxItems - stops indexing after x items > MaxMegs - stops indexing after x MB of data > > File System Indexer > URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/ > > Web Indexer > HTTPUser > HTTPPassword > HTTPUserAgent > ProxyServer > ProxyUser > ProxyPassword > HTTPSCertificate > HTTPSPrivateKey > > Other Possible Indexers > Microsoft Exchange 5.5/2000 > Lotus Notes > Newsgroup (NNTP) > Documentum > ODBC/OLEDB > XML - index single XML that represents multiple documents > > > Document Factory > General > The minimum properties for each document should be: > URL > Title > Abstract > Full Text > Score > > HTML > Support for META tags including Dublic Core syntax > > Other Possible Document Factories > Office Docs - DOC, XLS, PPT > PDF > > > Thanks for the great proposal. > > Mark Tucker > > > -----Original Message----- > From: Andrew C. Oliver [mailto:acoliver@apache.org] > Sent: Thursday, February 07, 2002 5:35 AM > To: Lucene Developers List > Subject: Proposal for Lucene > > > Hi All, > > This is just a few thoughts about Lucene. Please send me your feedback, > critiques and thought. > > If you folks would take a look: > > http://www.trilug.org/~acoliver/luceneplan.html > > if you'd like to submit patches: > > http://www.trilug.org/~acoliver/luceneplan.xml > > Once I've gotten feedback from the developer community I'll send this to > the user community as well. > > Thanks, > > Andy > -- > www.superlinksoftware.com > www.sourceforge.net/projects/poi - port of Excel format to java > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > - fix java generics! > > > The avalanche has already started. It is too late for the pebbles to > vote. > -Ambassador Kosh > > > -- > To unsubscribe, e-mail: > > For additional commands, e-mail: > > > > -- > To unsubscribe, e-mail: > > For additional commands, e-mail: > > > > > > > > -- > To unsubscribe, e-mail: > For additional commands, e-mail: > -- www.superlinksoftware.com www.sourceforge.net/projects/poi - port of Excel format to java http://developer.java.sun.com/developer/bugParade/bugs/4487555.html - fix java generics! The avalanche has already started. It is too late for the pebbles to vote. -Ambassador Kosh -- To unsubscribe, e-mail: For additional commands, e-mail: