lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Goetz <br...@quiotix.com>
Subject Re: Configuration RFC
Date Mon, 15 Jul 2002 03:53:58 GMT
> Source types handled (HTTP, FTP, FILE, SQL?)

These can basically be handled with URLs (certainly the first three.)
The crawler should generate a list of document URLs to be indexed, and
then the indexer, which you should be able to throttle so it doesn't
take up excessive resources, then later goes and gathers the actual
document.

Having a framework for dealing with multiple file types (text, HTML,
PDF, Word, etc) is critical.  There was a proposal that floated around
a few months ago which should be dusted off.


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message