lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelvin Tan" <kel...@relevanz.com>
Subject Re: Proposal for Lucene
Date Sat, 09 Feb 2002 12:58:52 GMT
Here it is. Released under APL (I kinda copied and pasted the license from
some Fulcrum code). Some (current) limitations:

1. Only a single datasource is supported at this point in time (support for
multiple datasources can be easily added through the configuration file and
improving SearchConfiguration)
2. Documentation isn't really complete. (Is it ever?)
3. It's a filesystem-based indexer. It's not too difficult to decouple the
filesystem bit and make it more generic, but I don't have a need for it
presently.
4. A temp folder is needed for extracting Zip, GZip and Tar files. I tried
using outputstreams but they turned out to be quite a nightmare...
5. There's a JDBCDatasource for indexing a table from databases (the table
stores metadata of the file to index. There should still be some way to
obtain the file to index. This ties back to 3.). I really ought to provide
an example on how to use it...

Questions and feedback are really welcome.

I've attached the source-only version, but there's a full version (with
libs) at http://www.relevanz.com/search_full.zip.

----- Original Message -----
From: Andrew C. Oliver <acoliver@apache.org>
To: Lucene Developers List <lucene-dev@jakarta.apache.org>
Sent: Friday, February 08, 2002 9:18 PM
Subject: Re: Proposal for Lucene


> Is this open source?  APL'd?  Where can I look at it?
>
> -Andy
>
> On Thu, 2002-02-07 at 20:27, Kelvin Tan wrote:
> > Great suggestions all around, and I'm pretty much in agreement with
what's been said.
> >
> > In my app, I've built a mini-framework around the searching such that
I'm able to map ContentHandlers (which index file contents) to file
extensions. I've been wanting to clean it up and contribute it for awhile,
but haven't overcome the intertia to do so. Also introduced a DataSource
(which can pretty much be anything, like a filesystem, a database, a URL,
etc) from which to obtain the data to index, so I think it _could_ be inline
with what some of you have in mind.
> >
> > I could also use alot of feedback with what's been done too...
> >
> > So what's the plan to move forward?
> >
> > K
> >   ----- Original Message -----
> >   From: Mark Tucker
> >   To: Lucene Developers List
> >   Sent: Friday, February 08, 2002 4:03 AM
> >   Subject: RE: Proposal for Lucene
> >
> >
> >   I like what you included in your proposal and suggest doing all that
(over time) and taking the following into consideration:
> >
> >   Indexers/Crawlers
> >
> >   General Settings
> >   SleeptimeBetweenCalls - can be used to avoid flooding a machine with
too many requests
> >   IndexerTimeout - kill this crawler thread after long period of
inactivity
> >   IncludeFilter - include only items matching filter
> >   ExcludeFilter - exclude items matching filter (can be used with
IncludeFilter)
> >   MaxItems - stops indexing after x items
> >   MaxMegs - stops indexing after x MB of data
> >
> >   File System Indexer
> >   URLReplacePrefix - can crawl c:\ but expose URL as
http://mysever/docs/
> >
> >   Web Indexer
> >   HTTPUser
> >   HTTPPassword
> >   HTTPUserAgent
> >   ProxyServer
> >   ProxyUser
> >   ProxyPassword
> >   HTTPSCertificate
> >   HTTPSPrivateKey
> >
> >   Other Possible Indexers
> >   Microsoft Exchange 5.5/2000
> >   Lotus Notes
> >   Newsgroup (NNTP)
> >   Documentum
> >   ODBC/OLEDB
> >   XML - index single XML that represents multiple documents
> >
> >
> >   Document Factory
> >   General
> >   The minimum properties for each document should be:
> >   URL
> >   Title
> >   Abstract
> >   Full Text
> >   Score
> >
> >   HTML
> >   Support for META tags including Dublic Core syntax
> >
> >   Other Possible Document Factories
> >   Office Docs - DOC, XLS, PPT
> >   PDF
> >
> >
> >   Thanks for the great proposal.
> >
> >   Mark Tucker
> >
> >
> >   -----Original Message-----
> >   From: Andrew C. Oliver [mailto:acoliver@apache.org]
> >   Sent: Thursday, February 07, 2002 5:35 AM
> >   To: Lucene Developers List
> >   Subject: Proposal for Lucene
> >
> >
> >   Hi All,
> >
> >   This is just a few thoughts about Lucene.  Please send me your
feedback,
> >   critiques and thought.
> >
> >   If you folks would take a look:
> >
> >   http://www.trilug.org/~acoliver/luceneplan.html
> >
> >   if you'd like to submit patches:
> >
> >   http://www.trilug.org/~acoliver/luceneplan.xml
> >
> >   Once I've gotten feedback from the developer community I'll send this
to
> >   the user community as well.
> >
> >   Thanks,
> >
> >   Andy
> >   --
> >   www.superlinksoftware.com
> >   www.sourceforge.net/projects/poi - port of Excel format to java
> >   http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> >   - fix java generics!
> >
> >
> >   The avalanche has already started. It is too late for the pebbles to
> >   vote.
> >   -Ambassador Kosh
> >
> >
> >   --
> >   To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> >   For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
> >
> >
> >   --
> >   To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> >   For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
> >
> >
> >
> --
> www.superlinksoftware.com
> www.sourceforge.net/projects/poi - port of Excel format to java
> http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> - fix java generics!
>
>
> The avalanche has already started. It is too late for the pebbles to
> vote.
> -Ambassador Kosh
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>
>

Mime
View raw message