lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason rutherglen <jasonhus...@yahoo.com>
Subject Re: GData, updateable IndexSearcher
Date Wed, 26 Apr 2006 20:11:49 GMT
This originated on the Solr mailing list.

> That's the way Lucene changes.

I was thinking you implied that you knew of someone who had customized their own, but it was
a closed source solution.  And if so then you would know how that project faired.  

I definitely sounds like an interesting project, it will take me several days to digest the
design you described.  As this would be used with Solr I wonder if there would be a good way
to also update the Solr caches.  Wouldn't there also need to be a hack on the IndexWriter
to keep track of new segments?

----- Original Message ----
From: Doug Cutting <cutting@apache.org>
To: solr-dev@lucene.apache.org
Sent: Wednesday, April 26, 2006 11:27:44 AM
Subject: Re: GData, updateable IndexSearcher

jason rutherglen wrote:
> Interesting, does this mean there is a plan for incrementally updateable IndexSearchers
to become part of Lucene?

In general, there is no plan for Lucene.  If someone implements a 
generally useful, efficient, feature in a back-compatible, easy to use, 
manner, and submits it as a patch, then it becomes a part of Lucene. 
That's the way Lucene changes.  Since we don't pay anyone, we can't make 
plans and assign tasks.  So if you're particularly interested in this 
feature, you might search the archives to find past efforts, or simply 
try to implement it yourself.

I think a good approach would be to create a new IndexSearcher instance 
based on an existing one, that shares IndexReaders.  Similarly, one 
should be able to create a new IndexReader based on an existing one. 
This would be a MultiReader that shares many of the same SegmentReaders.

Things get a little tricky after this.

Lucene caches filters based on the IndexReader.  So filters would need 
to be re-created.  Ideally these could be incrementally re-created, but 
that might be difficult.  What might be simpler would be to use a 
MultiSearcher constructed with an IndexSearcher per SegmentReader, 
avoiding the use of MultiReader.  Then the caches would still work. 
This would require making a few things public that are not at present. 
Perhaps adding a 'MultiReader.getSubReaders()' method, combined with an 
'static IndexReader.reopen(IndexReader)' method.  The latter would 
return a new MultiReader that shared SegmentReaders with the old 
version.  Then one could use getSubReaders() on the new multi reader to 
extract the current set to use when constructing a MultiSearcher.

Another tricky bit is figuring out when to close readers.

Does this make sense?  This discussion should probably move to the 
lucene-dev list.

> Are there any negatives to updateable IndexSearchers?  

Not if implemented well!

Doug




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message