lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew C. Oliver" <acoli...@apache.org>
Subject RE: Proposal for Lucene
Date Sun, 24 Feb 2002 16:02:10 GMT
Marc,

I implemented your suggestions.  I didn't add URLReplaceIndex as that
was already in the AbstractIndexer (maybe I should clarify it--
suggestions?) (base context etc).  I also didn't put the *standard
fields* as I'm not sure that is appropriate, I think we should have that
configurable. I'm open to more discussion on that.

I also did not put *other document factories* --  I don't want to list
every possible one.  The proposal only meant to give a few examples for
illustration.  (Perhaps that should be stated more clearly?

The "WEB Indexer" -- Perhaps this shows the need for more settings and
maybe we need a further extraction that pulls all of the sources
together aside from the crawling datasource handler?  Anyone have any
suggestions on that?

This is now in CVS... take a look when you get the chance and make sure
I didn't leave anything out that I might not should have.

On Thu, 2002-02-07 at 15:03, Mark Tucker wrote:
> I like what you included in your proposal and suggest doing all that (over time) and
taking the following into consideration:
> 
> Indexers/Crawlers
> 
> 	General Settings
> 		SleeptimeBetweenCalls - can be used to avoid flooding a machine with too many requests
> 		IndexerTimeout - kill this crawler thread after long period of inactivity
> 		IncludeFilter - include only items matching filter
> 		ExcludeFilter - exclude items matching filter (can be used with IncludeFilter)
> 		MaxItems - stops indexing after x items
> 		MaxMegs - stops indexing after x MB of data
> 
> 	File System Indexer
> 		URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/
> 		
> 	Web Indexer
> 		HTTPUser
> 		HTTPPassword
> 		HTTPUserAgent
> 		ProxyServer
> 		ProxyUser
> 		ProxyPassword
> 		HTTPSCertificate
> 		HTTPSPrivateKey
> 
> 	Other Possible Indexers
> 		Microsoft Exchange 5.5/2000
> 		Lotus Notes
> 		Newsgroup (NNTP)
> 		Documentum
> 		ODBC/OLEDB
> 		XML - index single XML that represents multiple documents
> 
> 
> Document Factory		
> 	General
> 		The minimum properties for each document should be:
> 			URL
> 			Title
> 			Abstract
> 			Full Text
> 			Score
> 
> 	HTML
> 		Support for META tags including Dublic Core syntax
> 
> 	Other Possible Document Factories
> 		Office Docs - DOC, XLS, PPT
> 		PDF
> 		
> 
> Thanks for the great proposal.
> 
> Mark Tucker
> 			
> 
> -----Original Message-----
> From: Andrew C. Oliver [mailto:acoliver@apache.org]
> Sent: Thursday, February 07, 2002 5:35 AM
> To: Lucene Developers List
> Subject: Proposal for Lucene
> 
> 
> Hi All,
> 
> This is just a few thoughts about Lucene.  Please send me your feedback,
> critiques and thought.
> 
> If you folks would take a look:
> 
> http://www.trilug.org/~acoliver/luceneplan.html
> 
> if you'd like to submit patches:
> 
> http://www.trilug.org/~acoliver/luceneplan.xml
> 
> Once I've gotten feedback from the developer community I'll send this to
> the user community as well.
> 
> Thanks,
> 
> Andy
> -- 
> www.superlinksoftware.com
> www.sourceforge.net/projects/poi - port of Excel format to java
> http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
> 			- fix java generics!
> 
> 
> The avalanche has already started. It is too late for the pebbles to
> vote.
> -Ambassador Kosh
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
> 
-- 
http://www.superlinksoftware.com
http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document 
                            format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
			- fix java generics!
The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message