lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: Questions about doc store files (.cfx)
Date Tue, 10 Nov 2009 05:06:20 GMT
On 11/9/09 5:40 PM, Michael Busch wrote:
> I think that should be ok with parallel indexing, as long as we can 
> always select all corresponding segments from *all* parallel indexes 
> for a merge to keep the docIds in sync.
>
> That actually leads me to another question: Let's say you have three 
> segments a, b, c.  b and c share the same doc store. You perform 
> deletes on a and b. Then you call expungeDeletes(). Normally that call 
> should only merge a and b, because c doesn't have any deletes. But b 
> and c have to participate in the same merge, because they share the 
> same doc store, right? So would it merge all three segments?
>
> If that's the case (that b and c must be part of the same merge) then 
> it would make the parallel indexing more difficult. The reason is that 
> if two parallel indexes 1 and 2 can decide on their own how to share 
> e.g. doc stores across segments, then we might come into a situation 
> where 1a and 1b share the same doc store, and 2b and 2c share the same 
> doc store. Then if index 1 needs to merge 1a and 1b, it can't assume 
> that this merge is allowed. There would have to be someone on top of 
> the whole thing who decides that all three segments need to be merged 
> at the same time, because b is connected to a and c in the two 
> parallel indexes. I wouldn't like such a restriction very much.
>
> We could think about allowing merges like ab->d, even if b,c share the 
> same doc store. That would mean to copy the b part of the shared bc 
> doc store into the new segment d. Then until c gets deleted the stored 
> docs of b would be on disk twice and require more disk space temporarily.
>

I think this is exactly what happens? I wrote a small test program that 
creates a situation like mentioned above in the "expungeDelete" 
scenario. It ends up with a docstore containing docs from two segments, 
but after expungeDeletes only one segment references the docstore. The 
non-deleted docs from the other segment end up in a new segment, so they 
are twice on disk (once orphaned in the old docstore, once in the new 
segment).
Is that the desired behavior?

  Michael

> Well maybe there is already a solution for all this in the code and 
> I'm just not aware of it?
>
>  Michael
>
>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message