lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: To clone or have a pluggable docidbitset for IndexReader
Date Tue, 16 Dec 2008 20:55:21 GMT

So it seems like a cloned reader would share everything with the
previous reader, but these rules would be enforced:

   * If the old reader had pending changes (held the write lock) when
     it was cloned, it 1) transfers the write lock to the clone, 2)
     refuses any further changes to itself (freezes itself), 3)
     continues to reflect the pending changes, and 4) will not commit
     its changes to disk when it's closed.  Ie it freezes itself into
     a "point in time" snapshot, just not via an on-disk index.

   * If any changes (to deletions or norms) are done with the new
     reader, it then makes a private copy ("copy on write").  This
     would apply to reopen too, since clone & reopen share the same
     code; so this is an "improvement" over the current reopen
     semantics and we should fix the javadocs saying so.

It seems like the only reason to clone would be if you intend to
[further] change deletions or norms but still want to use the previous
reader w/ the unchanged deletions and norms, ie "snapshot" the
previous reader without going through disk as intermediary, right?

I think this is a reasonable use case.  Since an IndexReader can still
make changes (something I think we should eventually move away from,
but cannot, yet, because of the immediacy of deletions that
IndexReader offers), cloning is an important tool to let you make an
efficient "point in time" snapshot (without having to go through the
Directory).

If this makes sense, can you update the patch on LUCENE-1314 to
enforce these semantics?  I think we should get this in for 2.9?

Mike

Jason Rutherglen wrote:

> Mike,
>
> > needing a fast way to swap in your own deleted docs?
>
> Yes, however it is necessary to have a new IndexReader as well from  
> a "reopened" reader.  So clone seems the best approach (unless  
> there's a way I'm not seeing).  The clone
> code is coming along, the norms test seems to pass.  As long as  
> similar rules as reopen are followed such as from the javadoc "The  
> re-opened reader instance and the old instance might share the same  
> resources. For this reason no index modification operations (e. g.  
> deleteDocument(int), setNorm(int, String, byte)) should be performed  
> using one of the readers until the old reader instance is closed.  
> Otherwise, the behavior of the readers is undefined.".
>
> I think the clone method javadoc should read "After cloning a  
> reader, the original reader will throw exceptions on index  
> modification operations (e. g. deleteDocument(int), setNorm(int,  
> String, byte))".  This way one may read from the original, but the  
> cloned reader (new reader) may accept updates.  This happens by way  
> to automatically releasing a lock on clone (does this cause any  
> unforseen problems?).
>
> Jason
>
> On Tue, Dec 16, 2008 at 7:00 AM, Michael McCandless <lucene@mikemccandless.com 
> > wrote:
>
> Jason,
>
> Is your need for IndexReader.clone entirely driven by needing a fast  
> way to swap in your own deleted docs?
>
> Meaning, if you could plug in your own deleted docs to a reader  
> (somehow), would you not use clone anymore?
>
> Mike
>
>
> Jason Rutherglen wrote:
>
> Hello,
>
> In trying to figure out the best way to have a system for realtime  
> whereby
> the deletedDocs do not need to be saved there are two possible  
> methods,
> 1) setting the DocIdBitSet manually (which breaks the saving and  
> things,
> but does not require doing norms cloning), or 2) implementing  
> IndexReader.clone
> which requires deletedDocs and norms "copy on write".
>
> The discussion about reopen (https://issues.apache.org/jira/browse/LUCENE-743 
> )
> was lengthy and I can see from the code and the discussion why no  
> one wants to
> revisit IndexReader.reopen in the form of IndexReader.clone and  
> possibly mess things up.
>
> Is some alternative easier API possible that I'm missing?
>
> -J
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message