lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Tucker" <MTuc...@infoimage.com>
Subject RE: Proposal for Lucene
Date Thu, 07 Feb 2002 20:03:47 GMT
I like what you included in your proposal and suggest doing all that (over time) and taking
the following into consideration:

Indexers/Crawlers

	General Settings
		SleeptimeBetweenCalls - can be used to avoid flooding a machine with too many requests
		IndexerTimeout - kill this crawler thread after long period of inactivity
		IncludeFilter - include only items matching filter
		ExcludeFilter - exclude items matching filter (can be used with IncludeFilter)
		MaxItems - stops indexing after x items
		MaxMegs - stops indexing after x MB of data

	File System Indexer
		URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/
		
	Web Indexer
		HTTPUser
		HTTPPassword
		HTTPUserAgent
		ProxyServer
		ProxyUser
		ProxyPassword
		HTTPSCertificate
		HTTPSPrivateKey

	Other Possible Indexers
		Microsoft Exchange 5.5/2000
		Lotus Notes
		Newsgroup (NNTP)
		Documentum
		ODBC/OLEDB
		XML - index single XML that represents multiple documents


Document Factory		
	General
		The minimum properties for each document should be:
			URL
			Title
			Abstract
			Full Text
			Score

	HTML
		Support for META tags including Dublic Core syntax

	Other Possible Document Factories
		Office Docs - DOC, XLS, PPT
		PDF
		

Thanks for the great proposal.

Mark Tucker
			

-----Original Message-----
From: Andrew C. Oliver [mailto:acoliver@apache.org]
Sent: Thursday, February 07, 2002 5:35 AM
To: Lucene Developers List
Subject: Proposal for Lucene


Hi All,

This is just a few thoughts about Lucene.  Please send me your feedback,
critiques and thought.

If you folks would take a look:

http://www.trilug.org/~acoliver/luceneplan.html

if you'd like to submit patches:

http://www.trilug.org/~acoliver/luceneplan.xml

Once I've gotten feedback from the developer community I'll send this to
the user community as well.

Thanks,

Andy
-- 
www.superlinksoftware.com
www.sourceforge.net/projects/poi - port of Excel format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
			- fix java generics!


The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message