lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject RE: Solr updateRequestHandler and performance vs. atomicity
Date Mon, 24 May 2010 20:03:30 GMT
Hi Mark,

Unfortunately, indexing performance *is* of concern, otherwise I'd already be committing on
every post.

If your guess is correct, you are basically saying that adding a document to an index in Solr/Lucene
is just as fast as writing that file directly to the disk.  Because, obviously, if we want
guaranteed delivery, that's what we'd have to do.  But I think this is worth the experiment
- Solr/Lucene may be fast, but I have doubts that it can perform as well as raw disk I/O and
still manage to do anything in the way of document analysis or (heaven forbid) text extraction.

-----Original Message-----
From: ext Mark Miller [] 
Sent: Monday, May 24, 2010 3:33 PM
Subject: Re: Solr updateRequestHandler and performance vs. atomicity

On 5/24/10 3:10 PM, wrote:
> Hi all,
> It seems to me that the "commit" logic in the Solr updateRequestHandler
> (or wherever the logic is actually located) conflates two different
> semantics. One semantic is what you need to do to make the index process
> perform well. The other semantic is guaranteed atomicity of document
> reception by Solr.
> In particular, it would be nice to be able to post documents in such a
> way that you can guarantee that the document is permanently in Solr's
> queue, safe in the event of a Solr restart, etc., even if the document
> has not yet been "committed".
> This issue came up in the LCF talk that I gave, and I initially thought
> that separating the two kinds of events would necessarily be an LCF
> change, but the more I thought about it the more I realized that other
> Solr indexing clients may also benefit from such a separation.
> Does anyone agree? Where should this logic properly live?
> Thanks,
> Karl

Its an interesting idea - but I think you would likely pay a similar 
cost to guarantee reception as you would to commit (also, I'm not sure 
Lucene guarantees it - it works for consistency, but I'm not so sure it 
achieves durability).

I can think of two things offhand -

Perhaps store the text and use fsync to quasi guarantee acceptance - 
then index from the store on the commit.

Another simpler idea if only the separation is important and not the 
performance - index to another side index, taking advantage of Lucene's 
current commit functionality, and then use addIndex to merge to the main 
index on commit.

Just spit balling though.

I think this would obviously need to be an optional mode.

- Mark

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message