Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 73600 invoked from network); 22 Sep 2009 00:58:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Sep 2009 00:58:35 -0000 Received: (qmail 78647 invoked by uid 500); 22 Sep 2009 00:58:34 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 78559 invoked by uid 500); 22 Sep 2009 00:58:34 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 78550 invoked by uid 99); 22 Sep 2009 00:58:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Sep 2009 00:58:34 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jason.rutherglen@gmail.com designates 209.85.221.188 as permitted sender) Received: from [209.85.221.188] (HELO mail-qy0-f188.google.com) (209.85.221.188) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Sep 2009 00:58:23 +0000 Received: by qyk26 with SMTP id 26so401580qyk.5 for ; Mon, 21 Sep 2009 17:58:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=E3tQ76yX0Kct/o01ECNfxIk0gPlGgnhS+fQ97y2CaP4=; b=LFwSzoG6wlxYFeQfwPPYhFMM9wt96fwrUg5ukaZTLatqQq0UICJsi6Ztsvr5S8RRFu kkBQbg0XDnzCxgKp7ev+LhqmxM3nvPyyzfJlnFO1rTcvXemCIU3IRvRPMLhXWszZ5ug3 LxP27VgaUweaqsXQLZ6gQTIMYkS/dueAU46lM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=SpLBbu6VtKpLjdxV9alQEKzauCkK3O+H9uavIGTK3Z56ABeMl4/HvDsu90PUuezh/E uoAN6YS7Vg6iL2MYvsniMJrqF7X+J2k90hicoYqqivSuqMxoX254iTO3r7W6VwgOW4XG LmaXcCBNc0KJgFk6lyZG0R0SlOpQupF5NwD7U= MIME-Version: 1.0 Received: by 10.224.81.132 with SMTP id x4mr227172qak.163.1253581082277; Mon, 21 Sep 2009 17:58:02 -0700 (PDT) In-Reply-To: <8837fb770909211650x19f6dbc1nc1e0f621827d4906@mail.gmail.com> References: <62AB8D44-ABF1-415C-B70E-8CE97B967EE2@mac.com> <69de18140909171330x1b6ea3d1t5738037666601004@mail.gmail.com> <85d3c3b60909181329v51de1ef9tdddf2aba0774afc4@mail.gmail.com> <5e76b0ad0909182201m4cd78f56i17f33e548b3dae44@mail.gmail.com> <85d3c3b60909202208td195083p17d84e98ebb24e06@mail.gmail.com> <8837fb770909202312u525bc73dh22dd1d251098e5b3@mail.gmail.com> <85d3c3b60909211134j7e51addfpcad82e86353387e4@mail.gmail.com> <8837fb770909211650x19f6dbc1nc1e0f621827d4906@mail.gmail.com> Date: Mon, 21 Sep 2009 17:58:02 -0700 Message-ID: <85d3c3b60909211758l7871d573ka51340e6c449367f@mail.gmail.com> Subject: Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ? From: Jason Rutherglen To: java-dev@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I'm not sure I communicated the idea properly. If CMS is set to 1 thread, no matter how intensive the CPU for a merge, it's limited to 1 core of what is in many cases a 4 or 8 core server. That leaves the other 3 or 7 cores for queries, which if slow, indicates that it isn't the merging that's slowing down queries, but the dumping of the queried segments from the system IO cache. This holds true regardless of the merge policy used. So while a new merge policy sounds great, unless the system IO cache problem is solved, there will always be a lingering problem in regards to large merges with a regularly updated index. Avoiding large merges probably isn't the answer. And LogByteSizeMergePolicy somewhat allows managing the size of the segments merged already. I would personally prefer being able to merge segments up to a given estimated size, which requires LUCENE-1076 to do well. > is rather different from Lucene benchmark as we are testing high updates in a realtime environment Lucene's benchmark allows this. NearRealtimeReaderTask is a good place to start. On Mon, Sep 21, 2009 at 4:50 PM, John Wang wrote: > Jason: > > =C2=A0=C2=A0 Before jumping into any conclusions, let me describe the tes= t setup. It > is rather different from Lucene benchmark as we are testing high updates = in > a realtime environment: > > =C2=A0=C2=A0 We took a public corpus: medline, indexed to approximately 3= million > docs. And update all the docs over and over again for a 10 hour duration. > > =C2=A0=C2=A0 Only differences in code used where the different MergePolic= y settings > were applied. > > =C2=A0=C2=A0 Taking the variable of HW/OS out of the equation, let's igon= ored the > absolute numbers and compare the relative numbers between the two runs. > > =C2=A0=C2=A0 The spike is due to merging of a large segment when we accum= ulate. The > graph/perf numbers fit our hypothesis that the default MergePolicy choose= s > to merge small segments before large ones and does not handle segmens wit= h > high number of deletes well. > > =C2=A0=C2=A0=C2=A0 Merging is BOTH IO and CPU intensive. Especially large= ones. > > =C2=A0=C2=A0=C2=A0 I think the wiki explains it pretty well. > > =C2=A0=C2=A0=C2=A0 What are you saying is true with IO cache w.r.t. merge= . Everytime new > files are created, old files in IO cache is invalided. As the experiment > shows, this is detrimental to query performance when large segmens are be= ing > merged. > > =C2=A0=C2=A0=C2=A0 "As we move to a sharded model of indexes, large merge= s will > naturally not occur." Our test is on a 3 million document index, not very > large for a single shard. Some katta people have run it on a much much > larger index per shard. Saying large merges will not occur on indexes of > this size IMHO is unfounded. > > -John > > On Tue, Sep 22, 2009 at 2:34 AM, Jason Rutherglen > wrote: >> >> John, >> >> It would be great if Lucene's benchmark were used so everyone >> could execute the test in their own environment and verify. It's >> not clear the settings or code used to generate the results so >> it's difficult to draw any reliable conclusions. >> >> The steep spike shows greater evidence for the IO cache being >> cleared during large merges resulting in search performance >> degradation. See: >> http://www.lucidimagination.com/search/?q=3Dmadvise >> >> Merging is IO intensive, less CPU intensive, if the >> ConcurrentMergeScheduler is used, which defaults to 3 threads, >> then the CPU could be maxed out. Using a single thread on >> synchronous spinning magnetic media seems more logical. Queries >> are usually the inverse, CPU intensive, not IO intensive when >> the index is in the IO cache. After merging a large segment (or >> during), queries would start hitting disk, and the results >> clearly show that. The queries are suddenly more time consuming >> as they seek on disk at a time when IO activity is at it's peak >> from merging large segments. Using madvise would prevent usable >> indexes from being swapped to disk during a merge, query >> performance would continue unabated. >> >> As we move to a sharded model of indexes, large merges will >> naturally not occur. Shards will reach a specified size and new >> documents will be sent to new shards. >> >> -J >> >> On Sun, Sep 20, 2009 at 11:12 PM, John Wang wrote: >> > The current default Lucene MergePolicy does not handle frequent update= s >> > well. >> > >> > We have done some performance analysis with that and a custom merge >> > policy: >> > >> > http://code.google.com/p/zoie/wiki/ZoieMergePolicy >> > >> > -John >> > >> > On Mon, Sep 21, 2009 at 1:08 PM, Jason Rutherglen < >> > jason.rutherglen@gmail.com> wrote: >> > >> >> I opened SOLR-1447 for this >> >> >> >> 2009/9/18 Noble Paul =E0=B4=A8=E0=B5=8B=E0=B4=AC=E0=B4=BF=E0=B4=B3=E0= =B5=8D=E2=80=8D =C2=A0=E0=A4=A8=E0=A5=8B=E0=A4=AC=E0=A5=8D=E0=A4=B3=E0=A5= =8D : >> >> > We can use a simple reflection based implementation to simplify >> >> > reading too many parameters. >> >> > >> >> > What I wish to emphasize is that Solr should be agnostic of xml >> >> > altogether. It should only be aware of specific Objects and >> >> > interfaces. If users wish to plugin something else in some other wa= y >> >> > , >> >> > it should be fine >> >> > >> >> > >> >> > =C2=A0There is a huge learning involved in learning the current >> >> > solrconfig.xml . Let us not make people throw away that . >> >> > >> >> > On Sat, Sep 19, 2009 at 1:59 AM, Jason Rutherglen >> >> > wrote: >> >> >> Over the weekend I may write a patch to allow simple reflection >> >> >> based >> >> >> injection from within solrconfig. >> >> >> >> >> >> On Fri, Sep 18, 2009 at 8:10 AM, Yonik Seeley >> >> >> wrote: >> >> >>> On Thu, Sep 17, 2009 at 4:30 PM, Shalin Shekhar Mangar >> >> >>> wrote: >> >> >>>>> I was wondering if there is a way I can modify >> >> >>>>> calibrateSizeByDeletes >> >> just >> >> >>>>> by configuration ? >> >> >>>>> >> >> >>>> >> >> >>>> Alas, no. The only option that I see for you is to sub-class >> >> >>>> LogByteSizeMergePolicy and set calibrateSizeByDeletes to true in >> >> >>>> the >> >> >>>> constructor. However, please open a Jira issue and so we don't >> >> >>>> forget >> >> about >> >> >>>> it. >> >> >>> >> >> >>> It's the continuing stuff like this that makes me feel like we >> >> >>> should >> >> >>> be Spring (or equivalent) based someday... I'm just not sure how >> >> >>> we're >> >> >>> going to get there. >> >> >>> >> >> >>> -Yonik >> >> >>> http://www.lucidimagination.com >> >> >>> >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > ----------------------------------------------------- >> >> > Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org