lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Hatcher" <li...@ehatchersolutions.com>
Subject Re: Proposal for Lucene
Date Fri, 08 Feb 2002 03:00:35 GMT
I've developed something similar myself.  I've created an Ant task <index>
that uses DocumentHandler interface implementing classes - one that can be
used (<index class="...">) is a FileExtensionDocumentHandler. At build-time
I generate a Lucene index of static documents, and roll that into a web
application.

Its got some kinks, like how to deal with the documents because they contain
relative hyperlinks... so these documents either should be copied into the
WAR too (or somehow made accessible to the web app) or incorporated directly
into a Lucene field ("rawcontents" is what I'm using now).  These issues are
not tough to solve and having some additional parameters to my IndexTask
could allow such things to be customized by the user.

My task is still evolving, but my plan all along has been to donate it to
lucene-dev for incorporation in some form or another.

Let me know if you'd like it, and what package name you'd like to use.

    Erik


----- Original Message -----
From: "Kelvin Tan" <kelvin@relevanz.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Thursday, February 07, 2002 8:27 PM
Subject: Re: Proposal for Lucene


Great suggestions all around, and I'm pretty much in agreement with what's
been said.

In my app, I've built a mini-framework around the searching such that I'm
able to map ContentHandlers (which index file contents) to file extensions.
I've been wanting to clean it up and contribute it for awhile, but haven't
overcome the intertia to do so. Also introduced a DataSource (which can
pretty much be anything, like a filesystem, a database, a URL, etc) from
which to obtain the data to index, so I think it _could_ be inline with what
some of you have in mind.

I could also use alot of feedback with what's been done too...

So what's the plan to move forward?

K
  ----- Original Message -----
  From: Mark Tucker
  To: Lucene Developers List
  Sent: Friday, February 08, 2002 4:03 AM
  Subject: RE: Proposal for Lucene


  I like what you included in your proposal and suggest doing all that (over
time) and taking the following into consideration:

  Indexers/Crawlers

  General Settings
  SleeptimeBetweenCalls - can be used to avoid flooding a machine with too
many requests
  IndexerTimeout - kill this crawler thread after long period of inactivity
  IncludeFilter - include only items matching filter
  ExcludeFilter - exclude items matching filter (can be used with
IncludeFilter)
  MaxItems - stops indexing after x items
  MaxMegs - stops indexing after x MB of data

  File System Indexer
  URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/

  Web Indexer
  HTTPUser
  HTTPPassword
  HTTPUserAgent
  ProxyServer
  ProxyUser
  ProxyPassword
  HTTPSCertificate
  HTTPSPrivateKey

  Other Possible Indexers
  Microsoft Exchange 5.5/2000
  Lotus Notes
  Newsgroup (NNTP)
  Documentum
  ODBC/OLEDB
  XML - index single XML that represents multiple documents


  Document Factory
  General
  The minimum properties for each document should be:
  URL
  Title
  Abstract
  Full Text
  Score

  HTML
  Support for META tags including Dublic Core syntax

  Other Possible Document Factories
  Office Docs - DOC, XLS, PPT
  PDF


  Thanks for the great proposal.

  Mark Tucker


  -----Original Message-----
  From: Andrew C. Oliver [mailto:acoliver@apache.org]
  Sent: Thursday, February 07, 2002 5:35 AM
  To: Lucene Developers List
  Subject: Proposal for Lucene


  Hi All,

  This is just a few thoughts about Lucene.  Please send me your feedback,
  critiques and thought.

  If you folks would take a look:

  http://www.trilug.org/~acoliver/luceneplan.html

  if you'd like to submit patches:

  http://www.trilug.org/~acoliver/luceneplan.xml

  Once I've gotten feedback from the developer community I'll send this to
  the user community as well.

  Thanks,

  Andy
  --
  www.superlinksoftware.com
  www.sourceforge.net/projects/poi - port of Excel format to java
  http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
  - fix java generics!


  The avalanche has already started. It is too late for the pebbles to
  vote.
  -Ambassador Kosh


  --
  To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
  For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>


  --
  To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
  For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>






--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message