lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew C. Oliver" <acoli...@apache.org>
Subject Re: Proposal for Lucene
Date Fri, 08 Feb 2002 13:19:18 GMT
Is this open source?  APL'd?  Where can I look at it?

On Thu, 2002-02-07 at 22:00, Erik Hatcher wrote:
> I've developed something similar myself.  I've created an Ant task <index>
> that uses DocumentHandler interface implementing classes - one that can be
> used (<index class="...">) is a FileExtensionDocumentHandler. At build-time
> I generate a Lucene index of static documents, and roll that into a web
> application.
> 
> Its got some kinks, like how to deal with the documents because they contain
> relative hyperlinks... so these documents either should be copied into the
> WAR too (or somehow made accessible to the web app) or incorporated directly
> into a Lucene field ("rawcontents" is what I'm using now).  These issues are
> not tough to solve and having some additional parameters to my IndexTask
> could allow such things to be customized by the user.
> 
> My task is still evolving, but my plan all along has been to donate it to
> lucene-dev for incorporation in some form or another.
> 
> Let me know if you'd like it, and what package name you'd like to use.
> 
>     Erik
> 
> 
> ----- Original Message -----
> From: "Kelvin Tan" <kelvin@relevanz.com>
> To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
> Sent: Thursday, February 07, 2002 8:27 PM
> Subject: Re: Proposal for Lucene
> 
> 
> Great suggestions all around, and I'm pretty much in agreement with what's
> been said.
> 
> In my app, I've built a mini-framework around the searching such that I'm
> able to map ContentHandlers (which index file contents) to file extensions.
> I've been wanting to clean it up and contribute it for awhile, but haven't
> overcome the intertia to do so. Also introduced a DataSource (which can
> pretty much be anything, like a filesystem, a database, a URL, etc) from
> which to obtain the data to index, so I think it _could_ be inline with what
> some of you have in mind.
> 
> I could also use alot of feedback with what's been done too...
> 
> So what's the plan to move forward?
> 
> K
>   ----- Original Message -----
>   From: Mark Tucker
>   To: Lucene Developers List
>   Sent: Friday, February 08, 2002 4:03 AM
>   Subject: RE: Proposal for Lucene
> 
> 
>   I like what you included in your proposal and suggest doing all that (over
> time) and taking the following into consideration:
> 
>   Indexers/Crawlers
> 
>   General Settings
>   SleeptimeBetweenCalls - can be used to avoid flooding a machine with too
> many requests
>   IndexerTimeout - kill this crawler thread after long period of inactivity
>   IncludeFilter - include only items matching filter
>   ExcludeFilter - exclude items matching filter (can be used with
> IncludeFilter)
>   MaxItems - stops indexing after x items
>   MaxMegs - stops indexing after x MB of data
> 
>   File System Indexer
>   URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/
> 
>   Web Indexer
>   HTTPUser
>   HTTPPassword
>   HTTPUserAgent
>   ProxyServer
>   ProxyUser
>   ProxyPassword
>   HTTPSCertificate
>   HTTPSPrivateKey
> 
>   Other Possible Indexers
>   Microsoft Exchange 5.5/2000
>   Lotus Notes
>   Newsgroup (NNTP)
>   Documentum
>   ODBC/OLEDB
>   XML - index single XML that represents multiple documents
> 
> 
>   Document Factory
>   General
>   The minimum properties for each document should be:
>   URL
>   Title
>   Abstract
>   Full Text
>   Score
> 
>   HTML
>   Support for META tags including Dublic Core syntax
> 
>   Other Possible Document Factories
>   Office Docs - DOC, XLS, PPT
>   PDF
> 
> 
>   Thanks for the great proposal.
> 
>   Mark Tucker
> 
> 
>   -----Original Message-----
>   From: Andrew C. Oliver [mailto:acoliver@apache.org]
>   Sent: Thursday, February 07, 2002 5:35 AM
>   To: Lucene Developers List
>   Subject: Proposal for Lucene
> 
> 
>   Hi All,
> 
>   This is just a few thoughts about Lucene.  Please send me your feedback,
>   critiques and thought.
> 
>   If you folks would take a look:
> 
>   http://www.trilug.org/~acoliver/luceneplan.html
> 
>   if you'd like to submit patches:
> 
>   http://www.trilug.org/~acoliver/luceneplan.xml
> 
>   Once I've gotten feedback from the developer community I'll send this to
>   the user community as well.
> 
>   Thanks,
> 
>   Andy
>   --
>   www.superlinksoftware.com
>   www.sourceforge.net/projects/poi - port of Excel format to java
>   http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
>   - fix java generics!
> 
> 
>   The avalanche has already started. It is too late for the pebbles to
>   vote.
>   -Ambassador Kosh
> 
> 
>   --
>   To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>   For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> 
> 
>   --
>   To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>   For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
> 
-- 
www.superlinksoftware.com
www.sourceforge.net/projects/poi - port of Excel format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
			- fix java generics!


The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message