Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 40050 invoked from network); 24 May 2010 20:29:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 May 2010 20:29:31 -0000 Received: (qmail 87430 invoked by uid 500); 24 May 2010 20:29:30 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 87382 invoked by uid 500); 24 May 2010 20:29:30 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 87375 invoked by uid 99); 24 May 2010 20:29:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 May 2010 20:29:30 +0000 X-ASF-Spam-Status: No, hits=-0.3 required=10.0 tests=AWL,FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of simon.willnauer@googlemail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 May 2010 20:29:26 +0000 Received: by fxm16 with SMTP id 16so3058901fxm.35 for ; Mon, 24 May 2010 13:29:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=zXHL5Oa3vqxzY4ANhi/opsznZSCNRTjWKMszkrbl/9U=; b=lErSVmWKKZ4+EoFW+wy6Y/eH6ggC6vLQdq6q7akEiiANh9otcC3CJes0CTtlvkIv6G L9UoyBxNY8ZsRHpCcBujYL+BCVSMD+vGMO9zRsmpv5L1WR+3D3f3KR0Dudyu2hFocDms ue7jman/QNE6rCvWO7w4uIuAOQmhtNF0O635A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; b=lNCbWUnvwImAv19S5trUdRYBeqSnqlMwK1X34utquVE8trCLVG2eVL2v0enoyzKzC5 3dumbphmvivwoIFKnL/jXllEAMI1yOagtCvEA6/Zvg3vObraw8wc+b01NAcblw3dF5m2 588qHGL8doC3Yp4XBDMWPVsi0LYFq+bzCsGfA= MIME-Version: 1.0 Received: by 10.239.189.140 with SMTP id t12mr494320hbh.172.1274732944642; Mon, 24 May 2010 13:29:04 -0700 (PDT) Received: by 10.239.135.209 with HTTP; Mon, 24 May 2010 13:29:04 -0700 (PDT) Reply-To: simon.willnauer@gmail.com In-Reply-To: References: <4BFAD489.6050108@gmail.com> Date: Mon, 24 May 2010 22:29:04 +0200 Message-ID: Subject: Re: Solr updateRequestHandler and performance vs. atomicity From: Simon Willnauer To: dev@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Karl, what are you describing seems to be a good usecase for something like a message queue where you push a document or record to a queue which guarantees the queues persistence. I look at this from a little different perspective, in a distributed environment you would have to guarantee delivery to a single solr instance but on several or at least n instances but that is a different story. >From a Solr point of view this sounds like a need for a write-ahead log that guarantees durability and atomicity. I like this idea as it might also solve lots of problems in distributed environments (solr cloud) etc. Very interesting topic - should investigate more in this direction.... simon On Mon, May 24, 2010 at 10:03 PM, wrote: > Hi Mark, > > Unfortunately, indexing performance *is* of concern, otherwise I'd alread= y be committing on every post. > > If your guess is correct, you are basically saying that adding a document= to an index in Solr/Lucene is just as fast as writing that file directly t= o the disk. =C2=A0Because, obviously, if we want guaranteed delivery, that'= s what we'd have to do. =C2=A0But I think this is worth the experiment - So= lr/Lucene may be fast, but I have doubts that it can perform as well as raw= disk I/O and still manage to do anything in the way of document analysis o= r (heaven forbid) text extraction. > > > > -----Original Message----- > From: ext Mark Miller [mailto:markrmiller@gmail.com] > Sent: Monday, May 24, 2010 3:33 PM > To: dev@lucene.apache.org > Subject: Re: Solr updateRequestHandler and performance vs. atomicity > > On 5/24/10 3:10 PM, karl.wright@nokia.com wrote: >> Hi all, >> It seems to me that the "commit" logic in the Solr updateRequestHandler >> (or wherever the logic is actually located) conflates two different >> semantics. One semantic is what you need to do to make the index process >> perform well. The other semantic is guaranteed atomicity of document >> reception by Solr. >> In particular, it would be nice to be able to post documents in such a >> way that you can guarantee that the document is permanently in Solr's >> queue, safe in the event of a Solr restart, etc., even if the document >> has not yet been "committed". >> This issue came up in the LCF talk that I gave, and I initially thought >> that separating the two kinds of events would necessarily be an LCF >> change, but the more I thought about it the more I realized that other >> Solr indexing clients may also benefit from such a separation. >> Does anyone agree? Where should this logic properly live? >> Thanks, >> Karl > > Its an interesting idea - but I think you would likely pay a similar > cost to guarantee reception as you would to commit (also, I'm not sure > Lucene guarantees it - it works for consistency, but I'm not so sure it > achieves durability). > > I can think of two things offhand - > > Perhaps store the text and use fsync to quasi guarantee acceptance - > then index from the store on the commit. > > Another simpler idea if only the separation is important and not the > performance - index to another side index, taking advantage of Lucene's > current commit functionality, and then use addIndex to merge to the main > index on commit. > > Just spit balling though. > > I think this would obviously need to be an optional mode. > > -- > - Mark > > http://www.lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org