lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Hatcher" <>
Subject Re: Proposal for Lucene
Date Fri, 08 Feb 2002 03:00:35 GMT
I've developed something similar myself.  I've created an Ant task <index>
that uses DocumentHandler interface implementing classes - one that can be
used (<index class="...">) is a FileExtensionDocumentHandler. At build-time
I generate a Lucene index of static documents, and roll that into a web

Its got some kinks, like how to deal with the documents because they contain
relative hyperlinks... so these documents either should be copied into the
WAR too (or somehow made accessible to the web app) or incorporated directly
into a Lucene field ("rawcontents" is what I'm using now).  These issues are
not tough to solve and having some additional parameters to my IndexTask
could allow such things to be customized by the user.

My task is still evolving, but my plan all along has been to donate it to
lucene-dev for incorporation in some form or another.

Let me know if you'd like it, and what package name you'd like to use.


----- Original Message -----
From: "Kelvin Tan" <>
To: "Lucene Developers List" <>
Sent: Thursday, February 07, 2002 8:27 PM
Subject: Re: Proposal for Lucene

Great suggestions all around, and I'm pretty much in agreement with what's
been said.

In my app, I've built a mini-framework around the searching such that I'm
able to map ContentHandlers (which index file contents) to file extensions.
I've been wanting to clean it up and contribute it for awhile, but haven't
overcome the intertia to do so. Also introduced a DataSource (which can
pretty much be anything, like a filesystem, a database, a URL, etc) from
which to obtain the data to index, so I think it _could_ be inline with what
some of you have in mind.

I could also use alot of feedback with what's been done too...

So what's the plan to move forward?

  ----- Original Message -----
  From: Mark Tucker
  To: Lucene Developers List
  Sent: Friday, February 08, 2002 4:03 AM
  Subject: RE: Proposal for Lucene

  I like what you included in your proposal and suggest doing all that (over
time) and taking the following into consideration:


  General Settings
  SleeptimeBetweenCalls - can be used to avoid flooding a machine with too
many requests
  IndexerTimeout - kill this crawler thread after long period of inactivity
  IncludeFilter - include only items matching filter
  ExcludeFilter - exclude items matching filter (can be used with
  MaxItems - stops indexing after x items
  MaxMegs - stops indexing after x MB of data

  File System Indexer
  URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/

  Web Indexer

  Other Possible Indexers
  Microsoft Exchange 5.5/2000
  Lotus Notes
  Newsgroup (NNTP)
  XML - index single XML that represents multiple documents

  Document Factory
  The minimum properties for each document should be:
  Full Text

  Support for META tags including Dublic Core syntax

  Other Possible Document Factories
  Office Docs - DOC, XLS, PPT

  Thanks for the great proposal.

  Mark Tucker

  -----Original Message-----
  From: Andrew C. Oliver []
  Sent: Thursday, February 07, 2002 5:35 AM
  To: Lucene Developers List
  Subject: Proposal for Lucene

  Hi All,

  This is just a few thoughts about Lucene.  Please send me your feedback,
  critiques and thought.

  If you folks would take a look:

  if you'd like to submit patches:

  Once I've gotten feedback from the developer community I'll send this to
  the user community as well.


  -- - port of Excel format to java
  - fix java generics!

  The avalanche has already started. It is too late for the pebbles to
  -Ambassador Kosh

  To unsubscribe, e-mail:
  For additional commands, e-mail:

  To unsubscribe, e-mail:
  For additional commands, e-mail:

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message