lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew C. Oliver" <acoli...@apache.org>
Subject Re: Proposal for Lucene / new component
Date Sun, 24 Feb 2002 16:59:36 GMT
On Thu, 2002-02-14 at 06:42, Manfred Schäfer wrote:
> Hi,
> 
> 
> > I think it's redundant to hardcode the indexing logic into all crawler component
(ftp, http, jdbc, filesys crawler). It's an interesting question how the components can communicate?
(don't you think using avalon is a good way?)
> 
> I've just had a look at avalon, and it looks promising.
> 
> As i've written before, i am thinking of three different component types: sources, transformators
and indexer(Lucene). I thought a little bit about a flexible way for configuration of the
indexing procedure and it seems that there could be many many ways for combining sources,
transformers and Lucene. What do you think about
> using a blackboard design pattern: Sources are producing records into a central repostitory.
Transformator are registering for records with a  special signature and are getting these
records for transformation. Finally, if nobody wants to transform a record anymore, it is
delivered to lucene.
> 

right, not sure I want to start out that way though.  Just adding
content handling and location abstraction etc is tough enough for the
first iteration... handling a inter-machine communication process makes
it more complex and achieves poorer performance on the smaller jobs
(compiling *hello world* would be slower on a beowulf cluster).  Once we
get the basic case we can grow out from there.

> btw: it would be nice, if indexing could be in sync with the indexed data. If files were
deleted, the index entries should also been deleted.
> 

On subsequent runs of the crawler yes..  That sounds good.  Not sure I'd
say that happens immediately (unless you're writing a filesystem driver
or something)

> regards,
> 
> manfred
> 
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
> 
-- 
http://www.superlinksoftware.com
http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document 
                            format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
			- fix java generics!
The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message