lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelvin Tan" <kel...@relevanz.com>
Subject Re: Proposal for Lucene
Date Tue, 26 Feb 2002 09:00:39 GMT

----- Original Message -----
From: "Andrew C. Oliver" <acoliver@apache.org>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Monday, February 25, 2002 12:48 AM
Subject: Re: Proposal for Lucene


> Wow this is an awesome starting point!  I'm awed!
> The object model is
> nice and abstracted and yet clean and simple..  I only scanned it but I
> already feel like I understand it.  Are you okay with us putting this in
> a scratchpad area in lucene repository (I gather "yes") and refactoring
> it as a starting point?

I'd be more than happy if you could do that. It would be nice if Lucene had
the equivalent of the commons-sandbox or turbine-stratum, a workplace
kind-of.

Regards,
Kelvin

>
> Has anyone else looked at this?  Any objections?
>
> -Andy
>
>
> On Sat, 2002-02-09 at 07:58, Kelvin Tan wrote:
> > Here it is. Released under APL (I kinda copied and pasted the license
from
> > some Fulcrum code). Some (current) limitations:
> >
> > 1. Only a single datasource is supported at this point in time (support
for
> > multiple datasources can be easily added through the configuration file
and
> > improving SearchConfiguration)
> > 2. Documentation isn't really complete. (Is it ever?)
> > 3. It's a filesystem-based indexer. It's not too difficult to decouple
the
> > filesystem bit and make it more generic, but I don't have a need for it
> > presently.
> > 4. A temp folder is needed for extracting Zip, GZip and Tar files. I
tried
> > using outputstreams but they turned out to be quite a nightmare...
> > 5. There's a JDBCDatasource for indexing a table from databases (the
table
> > stores metadata of the file to index. There should still be some way to
> > obtain the file to index. This ties back to 3.). I really ought to
provide
> > an example on how to use it...
> >
> > Questions and feedback are really welcome.
> >
> > I've attached the source-only version, but there's a full version (with
> > libs) at http://www.relevanz.com/search_full.zip.
> >
> > ----- Original Message -----
> > From: Andrew C. Oliver <acoliver@apache.org>
> > To: Lucene Developers List <lucene-dev@jakarta.apache.org>
> > Sent: Friday, February 08, 2002 9:18 PM
> > Subject: Re: Proposal for Lucene
> >
> >
> > > Is this open source?  APL'd?  Where can I look at it?
> > >
> > > -Andy
> > >
> > > On Thu, 2002-02-07 at 20:27, Kelvin Tan wrote:
> > > > Great suggestions all around, and I'm pretty much in agreement with
> > what's been said.
> > > >
> > > > In my app, I've built a mini-framework around the searching such
that
> > I'm able to map ContentHandlers (which index file contents) to file
> > extensions. I've been wanting to clean it up and contribute it for
awhile,
> > but haven't overcome the intertia to do so. Also introduced a DataSource
> > (which can pretty much be anything, like a filesystem, a database, a
URL,
> > etc) from which to obtain the data to index, so I think it _could_ be
inline
> > with what some of you have in mind.
> > > >
> > > > I could also use alot of feedback with what's been done too...
> > > >
> > > > So what's the plan to move forward?
> > > >
> > > > K
> > > >   ----- Original Message -----
> > > >   From: Mark Tucker
> > > >   To: Lucene Developers List
> > > >   Sent: Friday, February 08, 2002 4:03 AM
> > > >   Subject: RE: Proposal for Lucene
> > > >
> > > >
> > > >   I like what you included in your proposal and suggest doing all
that
> > (over time) and taking the following into consideration:
> > > >
> > > >   Indexers/Crawlers
> > > >
> > > >   General Settings
> > > >   SleeptimeBetweenCalls - can be used to avoid flooding a machine
with
> > too many requests
> > > >   IndexerTimeout - kill this crawler thread after long period of
> > inactivity
> > > >   IncludeFilter - include only items matching filter
> > > >   ExcludeFilter - exclude items matching filter (can be used with
> > IncludeFilter)
> > > >   MaxItems - stops indexing after x items
> > > >   MaxMegs - stops indexing after x MB of data
> > > >
> > > >   File System Indexer
> > > >   URLReplacePrefix - can crawl c:\ but expose URL as
> > http://mysever/docs/
> > > >
> > > >   Web Indexer
> > > >   HTTPUser
> > > >   HTTPPassword
> > > >   HTTPUserAgent
> > > >   ProxyServer
> > > >   ProxyUser
> > > >   ProxyPassword
> > > >   HTTPSCertificate
> > > >   HTTPSPrivateKey
> > > >
> > > >   Other Possible Indexers
> > > >   Microsoft Exchange 5.5/2000
> > > >   Lotus Notes
> > > >   Newsgroup (NNTP)
> > > >   Documentum
> > > >   ODBC/OLEDB
> > > >   XML - index single XML that represents multiple documents
> > > >
> > > >
> > > >   Document Factory
> > > >   General
> > > >   The minimum properties for each document should be:
> > > >   URL
> > > >   Title
> > > >   Abstract
> > > >   Full Text
> > > >   Score
> > > >
> > > >   HTML
> > > >   Support for META tags including Dublic Core syntax
> > > >
> > > >   Other Possible Document Factories
> > > >   Office Docs - DOC, XLS, PPT
> > > >   PDF
> > > >
> > > >
> > > >   Thanks for the great proposal.
> > > >
> > > >   Mark Tucker
> > > >
> > > >
> > > >   -----Original Message-----
> > > >   From: Andrew C. Oliver [mailto:acoliver@apache.org]
> > > >   Sent: Thursday, February 07, 2002 5:35 AM
> > > >   To: Lucene Developers List
> > > >   Subject: Proposal for Lucene
> > > >
> > > >
> > > >   Hi All,
> > > >
> > > >   This is just a few thoughts about Lucene.  Please send me your
> > feedback,
> > > >   critiques and thought.
> > > >
> > > >   If you folks would take a look:
> > > >
> > > >   http://www.trilug.org/~acoliver/luceneplan.html
> > > >
> > > >   if you'd like to submit patches:
> > > >
> > > >   http://www.trilug.org/~acoliver/luceneplan.xml
> > > >
> > > >   Once I've gotten feedback from the developer community I'll send
this
> > to
> > > >   the user community as well.
> > > >
> > > >   Thanks,
> > > >
> > > >   Andy
> > > >   --
> > > >   www.superlinksoftware.com
> > > >   www.sourceforge.net/projects/poi - port of Excel format to java
> > > >
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> > > >   - fix java generics!
> > > >
> > > >
> > > >   The avalanche has already started. It is too late for the pebbles
to
> > > >   vote.
> > > >   -Ambassador Kosh
> > > >
> > > >
> > > >   --
> > > >   To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > > >   For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> > > >
> > > >
> > > >   --
> > > >   To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > > >   For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> > > >
> > > >
> > > >
> > > --
> > > www.superlinksoftware.com
> > > www.sourceforge.net/projects/poi - port of Excel format to java
> > > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> > > - fix java generics!
> > >
> > >
> > > The avalanche has already started. It is too late for the pebbles to
> > > vote.
> > > -Ambassador Kosh
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> > >
> > >
> > ----
> >
>
> > --
> > To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
> --
> http://www.superlinksoftware.com
> http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document
>                             format to java
> http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> - fix java generics!
> The avalanche has already started. It is too late for the pebbles to
> vote.
> -Ambassador Kosh
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message