lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Solr updateRequestHandler and performance vs. atomicity
Date Mon, 24 May 2010 19:33:29 GMT
On 5/24/10 3:10 PM, wrote:
> Hi all,
> It seems to me that the “commit” logic in the Solr updateRequestHandler
> (or wherever the logic is actually located) conflates two different
> semantics. One semantic is what you need to do to make the index process
> perform well. The other semantic is guaranteed atomicity of document
> reception by Solr.
> In particular, it would be nice to be able to post documents in such a
> way that you can guarantee that the document is permanently in Solr’s
> queue, safe in the event of a Solr restart, etc., even if the document
> has not yet been “committed”.
> This issue came up in the LCF talk that I gave, and I initially thought
> that separating the two kinds of events would necessarily be an LCF
> change, but the more I thought about it the more I realized that other
> Solr indexing clients may also benefit from such a separation.
> Does anyone agree? Where should this logic properly live?
> Thanks,
> Karl

Its an interesting idea - but I think you would likely pay a similar 
cost to guarantee reception as you would to commit (also, I'm not sure 
Lucene guarantees it - it works for consistency, but I'm not so sure it 
achieves durability).

I can think of two things offhand -

Perhaps store the text and use fsync to quasi guarantee acceptance - 
then index from the store on the commit.

Another simpler idea if only the separation is important and not the 
performance - index to another side index, taking advantage of Lucene's 
current commit functionality, and then use addIndex to merge to the main 
index on commit.

Just spit balling though.

I think this would obviously need to be an optional mode.

- Mark

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message