lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: Solr updateRequestHandler and performance vs. atomicity
Date Mon, 24 May 2010 20:29:04 GMT
Hi Karl,

what are you describing seems to be a good usecase for something like
a message queue where you push a document or record to a queue which
guarantees the queues persistence. I look at this from a little
different perspective, in a distributed environment you would have to
guarantee delivery to a single solr instance but on several or at
least n instances but that is a different story.

>From a Solr point of view this sounds like a need for a write-ahead
log that guarantees durability and atomicity. I like this idea as it
might also solve lots of problems in distributed environments (solr
cloud) etc.

Very interesting topic - should investigate more in this direction....


On Mon, May 24, 2010 at 10:03 PM,  <> wrote:
> Hi Mark,
> Unfortunately, indexing performance *is* of concern, otherwise I'd already be committing
on every post.
> If your guess is correct, you are basically saying that adding a document to an index
in Solr/Lucene is just as fast as writing that file directly to the disk.  Because, obviously,
if we want guaranteed delivery, that's what we'd have to do.  But I think this is worth the
experiment - Solr/Lucene may be fast, but I have doubts that it can perform as well as raw
disk I/O and still manage to do anything in the way of document analysis or (heaven forbid)
text extraction.
> -----Original Message-----
> From: ext Mark Miller []
> Sent: Monday, May 24, 2010 3:33 PM
> To:
> Subject: Re: Solr updateRequestHandler and performance vs. atomicity
> On 5/24/10 3:10 PM, wrote:
>> Hi all,
>> It seems to me that the "commit" logic in the Solr updateRequestHandler
>> (or wherever the logic is actually located) conflates two different
>> semantics. One semantic is what you need to do to make the index process
>> perform well. The other semantic is guaranteed atomicity of document
>> reception by Solr.
>> In particular, it would be nice to be able to post documents in such a
>> way that you can guarantee that the document is permanently in Solr's
>> queue, safe in the event of a Solr restart, etc., even if the document
>> has not yet been "committed".
>> This issue came up in the LCF talk that I gave, and I initially thought
>> that separating the two kinds of events would necessarily be an LCF
>> change, but the more I thought about it the more I realized that other
>> Solr indexing clients may also benefit from such a separation.
>> Does anyone agree? Where should this logic properly live?
>> Thanks,
>> Karl
> Its an interesting idea - but I think you would likely pay a similar
> cost to guarantee reception as you would to commit (also, I'm not sure
> Lucene guarantees it - it works for consistency, but I'm not so sure it
> achieves durability).
> I can think of two things offhand -
> Perhaps store the text and use fsync to quasi guarantee acceptance -
> then index from the store on the commit.
> Another simpler idea if only the separation is important and not the
> performance - index to another side index, taking advantage of Lucene's
> current commit functionality, and then use addIndex to merge to the main
> index on commit.
> Just spit balling though.
> I think this would obviously need to be an optional mode.
> --
> - Mark
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message