lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelvin Tan" <kel...@relevanz.com>
Subject Re: Proposal for Lucene
Date Fri, 08 Feb 2002 01:27:58 GMT
Great suggestions all around, and I'm pretty much in agreement with what's been said.

In my app, I've built a mini-framework around the searching such that I'm able to map ContentHandlers
(which index file contents) to file extensions. I've been wanting to clean it up and contribute
it for awhile, but haven't overcome the intertia to do so. Also introduced a DataSource (which
can pretty much be anything, like a filesystem, a database, a URL, etc) from which to obtain
the data to index, so I think it _could_ be inline with what some of you have in mind.

I could also use alot of feedback with what's been done too...

So what's the plan to move forward?

K 
  ----- Original Message ----- 
  From: Mark Tucker 
  To: Lucene Developers List 
  Sent: Friday, February 08, 2002 4:03 AM
  Subject: RE: Proposal for Lucene


  I like what you included in your proposal and suggest doing all that (over time) and taking
the following into consideration:

  Indexers/Crawlers

  General Settings
  SleeptimeBetweenCalls - can be used to avoid flooding a machine with too many requests
  IndexerTimeout - kill this crawler thread after long period of inactivity
  IncludeFilter - include only items matching filter
  ExcludeFilter - exclude items matching filter (can be used with IncludeFilter)
  MaxItems - stops indexing after x items
  MaxMegs - stops indexing after x MB of data

  File System Indexer
  URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/

  Web Indexer
  HTTPUser
  HTTPPassword
  HTTPUserAgent
  ProxyServer
  ProxyUser
  ProxyPassword
  HTTPSCertificate
  HTTPSPrivateKey

  Other Possible Indexers
  Microsoft Exchange 5.5/2000
  Lotus Notes
  Newsgroup (NNTP)
  Documentum
  ODBC/OLEDB
  XML - index single XML that represents multiple documents


  Document Factory 
  General
  The minimum properties for each document should be:
  URL
  Title
  Abstract
  Full Text
  Score

  HTML
  Support for META tags including Dublic Core syntax

  Other Possible Document Factories
  Office Docs - DOC, XLS, PPT
  PDF


  Thanks for the great proposal.

  Mark Tucker


  -----Original Message-----
  From: Andrew C. Oliver [mailto:acoliver@apache.org]
  Sent: Thursday, February 07, 2002 5:35 AM
  To: Lucene Developers List
  Subject: Proposal for Lucene


  Hi All,

  This is just a few thoughts about Lucene.  Please send me your feedback,
  critiques and thought.

  If you folks would take a look:

  http://www.trilug.org/~acoliver/luceneplan.html

  if you'd like to submit patches:

  http://www.trilug.org/~acoliver/luceneplan.xml

  Once I've gotten feedback from the developer community I'll send this to
  the user community as well.

  Thanks,

  Andy
  -- 
  www.superlinksoftware.com
  www.sourceforge.net/projects/poi - port of Excel format to java
  http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
  - fix java generics!


  The avalanche has already started. It is too late for the pebbles to
  vote.
  -Ambassador Kosh


  --
  To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
  For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


  --
  To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
  For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message