lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew C. Oliver" <acoli...@apache.org>
Subject Re: Proposal for Lucene
Date Sat, 09 Feb 2002 12:57:50 GMT
On Sat, 2002-02-09 at 07:58, Kelvin Tan wrote:
> Here it is. Released under APL (I kinda copied and pasted the license from
> some Fulcrum code). Some (current) limitations:
> 
> 1. Only a single datasource is supported at this point in time (support for
> multiple datasources can be easily added through the configuration file and
> improving SearchConfiguration)
> 2. Documentation isn't really complete. (Is it ever?)
> 3. It's a filesystem-based indexer. It's not too difficult to decouple the
> filesystem bit and make it more generic, but I don't have a need for it
> presently.
> 4. A temp folder is needed for extracting Zip, GZip and Tar files. I tried
> using outputstreams but they turned out to be quite a nightmare...

great I'll take a look at all of this when I get back next week (going
to Boston for a week, will be out of touch.)

> 5. There's a JDBCDatasource for indexing a table from databases (the table
> stores metadata of the file to index. There should still be some way to
> obtain the file to index. This ties back to 3.). I really ought to provide
> an example on how to use it...
> 

What's that good for...?  Wouldn't one just create an index on the
database?

> Questions and feedback are really welcome.
> 
> I've attached the source-only version, but there's a full version (with
> libs) at http://www.relevanz.com/search_full.zip.
> 
> ----- Original Message -----
> From: Andrew C. Oliver <acoliver@apache.org>
> To: Lucene Developers List <lucene-dev@jakarta.apache.org>
> Sent: Friday, February 08, 2002 9:18 PM
> Subject: Re: Proposal for Lucene
> 
> 
> > Is this open source?  APL'd?  Where can I look at it?
> >
> > -Andy
> >
> > On Thu, 2002-02-07 at 20:27, Kelvin Tan wrote:
> > > Great suggestions all around, and I'm pretty much in agreement with
> what's been said.
> > >
> > > In my app, I've built a mini-framework around the searching such that
> I'm able to map ContentHandlers (which index file contents) to file
> extensions. I've been wanting to clean it up and contribute it for awhile,
> but haven't overcome the intertia to do so. Also introduced a DataSource
> (which can pretty much be anything, like a filesystem, a database, a URL,
> etc) from which to obtain the data to index, so I think it _could_ be inline
> with what some of you have in mind.
> > >
> > > I could also use alot of feedback with what's been done too...
> > >
> > > So what's the plan to move forward?
> > >
> > > K
> > >   ----- Original Message -----
> > >   From: Mark Tucker
> > >   To: Lucene Developers List
> > >   Sent: Friday, February 08, 2002 4:03 AM
> > >   Subject: RE: Proposal for Lucene
> > >
> > >
> > >   I like what you included in your proposal and suggest doing all that
> (over time) and taking the following into consideration:
> > >
> > >   Indexers/Crawlers
> > >
> > >   General Settings
> > >   SleeptimeBetweenCalls - can be used to avoid flooding a machine with
> too many requests
> > >   IndexerTimeout - kill this crawler thread after long period of
> inactivity
> > >   IncludeFilter - include only items matching filter
> > >   ExcludeFilter - exclude items matching filter (can be used with
> IncludeFilter)
> > >   MaxItems - stops indexing after x items
> > >   MaxMegs - stops indexing after x MB of data
> > >
> > >   File System Indexer
> > >   URLReplacePrefix - can crawl c:\ but expose URL as
> http://mysever/docs/
> > >
> > >   Web Indexer
> > >   HTTPUser
> > >   HTTPPassword
> > >   HTTPUserAgent
> > >   ProxyServer
> > >   ProxyUser
> > >   ProxyPassword
> > >   HTTPSCertificate
> > >   HTTPSPrivateKey
> > >
> > >   Other Possible Indexers
> > >   Microsoft Exchange 5.5/2000
> > >   Lotus Notes
> > >   Newsgroup (NNTP)
> > >   Documentum
> > >   ODBC/OLEDB
> > >   XML - index single XML that represents multiple documents
> > >
> > >
> > >   Document Factory
> > >   General
> > >   The minimum properties for each document should be:
> > >   URL
> > >   Title
> > >   Abstract
> > >   Full Text
> > >   Score
> > >
> > >   HTML
> > >   Support for META tags including Dublic Core syntax
> > >
> > >   Other Possible Document Factories
> > >   Office Docs - DOC, XLS, PPT
> > >   PDF
> > >
> > >
> > >   Thanks for the great proposal.
> > >
> > >   Mark Tucker
> > >
> > >
> > >   -----Original Message-----
> > >   From: Andrew C. Oliver [mailto:acoliver@apache.org]
> > >   Sent: Thursday, February 07, 2002 5:35 AM
> > >   To: Lucene Developers List
> > >   Subject: Proposal for Lucene
> > >
> > >
> > >   Hi All,
> > >
> > >   This is just a few thoughts about Lucene.  Please send me your
> feedback,
> > >   critiques and thought.
> > >
> > >   If you folks would take a look:
> > >
> > >   http://www.trilug.org/~acoliver/luceneplan.html
> > >
> > >   if you'd like to submit patches:
> > >
> > >   http://www.trilug.org/~acoliver/luceneplan.xml
> > >
> > >   Once I've gotten feedback from the developer community I'll send this
> to
> > >   the user community as well.
> > >
> > >   Thanks,
> > >
> > >   Andy
> > >   --
> > >   www.superlinksoftware.com
> > >   www.sourceforge.net/projects/poi - port of Excel format to java
> > >   http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> > >   - fix java generics!
> > >
> > >
> > >   The avalanche has already started. It is too late for the pebbles to
> > >   vote.
> > >   -Ambassador Kosh
> > >
> > >
> > >   --
> > >   To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > >   For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> > >
> > >
> > >   --
> > >   To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > >   For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> > >
> > >
> > >
> > --
> > www.superlinksoftware.com
> > www.sourceforge.net/projects/poi - port of Excel format to java
> > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> > - fix java generics!
> >
> >
> > The avalanche has already started. It is too late for the pebbles to
> > vote.
> > -Ambassador Kosh
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> >
> >
> ----
> 

> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
-- 
www.superlinksoftware.com
www.sourceforge.net/projects/poi - port of Excel format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
			- fix java generics!


The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message