Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 23290 invoked from network); 13 Jan 2010 23:08:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Jan 2010 23:08:57 -0000 Received: (qmail 60683 invoked by uid 500); 13 Jan 2010 23:08:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 60611 invoked by uid 500); 13 Jan 2010 23:08:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 60601 invoked by uid 99); 13 Jan 2010 23:08:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jan 2010 23:08:54 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FUZZY_VLIUM,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jason.rutherglen@gmail.com designates 209.85.216.204 as permitted sender) Received: from [209.85.216.204] (HELO mail-px0-f204.google.com) (209.85.216.204) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jan 2010 23:08:44 +0000 Received: by pxi42 with SMTP id 42so12464pxi.5 for ; Wed, 13 Jan 2010 15:08:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=FDAlz3sYotubAaKq+2UxWw0ltQevxBP+5DSBScLnta8=; b=HEv9PuiuroQvYsCnb2oZTiuG25FR6uV98pnW8O8AdlQMho8Imuy3xUM2qEldtGi8gi jTETbEyzCKRIYoAommC3GGO2Uy+7SgJsMQ6lR+XUJWrAGN7Hkzz6p1Gn7oBGXBVSnOk7 BXc48xDXh0LwHjdQe0ArIuR2tdcpHUsd0K+ko= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=i7Mah41EM2J7+jPVglG3Gdet9AXArNMCguB1z8U989YsueYfP84jPD9tsFF7Yl6+6a +iXt0fb/58hzPdNT2L4lO501NjM32cnceBoKTg5o1AlyhOe7e1Z0o52k2rNntKjTY4/I j2vLRoN4BkQl4S2HG6Ik30e0LLoaxsxorxzHM= MIME-Version: 1.0 Received: by 10.141.188.19 with SMTP id q19mr6420625rvp.164.1263424103284; Wed, 13 Jan 2010 15:08:23 -0800 (PST) In-Reply-To: <520273.93119.qm@web50307.mail.re2.yahoo.com> References: <79015b391001131336t81cc1a9xa15267f5249244e@mail.gmail.com> <85d3c3b61001131343o63023bc9tb772770c12d9c00f@mail.gmail.com> <79015b391001131349j7158b650k7f28e814955377f@mail.gmail.com> <85d3c3b61001131357pcf2625ah9fcefb004646733e@mail.gmail.com> <79015b391001131429p729c967oaa5e4ab770854e@mail.gmail.com> <85d3c3b61001131435j92d28f1j9130abadcd72df3@mail.gmail.com> <79015b391001131444j69d898c7t98e6890aa298939e@mail.gmail.com> <85d3c3b61001131454t34625cd7tf45560acc2c3abfd@mail.gmail.com> <520273.93119.qm@web50307.mail.re2.yahoo.com> Date: Wed, 13 Jan 2010 15:08:23 -0800 Message-ID: <85d3c3b61001131508p408c6903j966ba551e775f00e@mail.gmail.com> Subject: Re: Max Segmentation Size when Optimizing Index From: Jason Rutherglen To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Right... It all blends together, I need an NLP analyzer for my emails On Wed, Jan 13, 2010 at 3:05 PM, Otis Gospodnetic wrote: > I think Jason meant "15-20GB segments"? > =A0Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > > ________________________________ > From: Jason Rutherglen > To: java-user@lucene.apache.org > Sent: Wed, January 13, 2010 5:54:38 PM > Subject: Re: Max Segmentation Size when Optimizing Index > > Yes... You could hack LogMergePolicy to do something else. > > I use optimise(numsegments:5) regularly on 80GB indexes, that if > optimized to 1 segment, would thrash the IO excessively. =A0This works > fine because 15-20GB indexes are plenty large and fast. > > On Wed, Jan 13, 2010 at 2:44 PM, Trin Chavalittumrong = wrote: >> Seems like optimize() only cares about final number of segments rather t= han >> the size of the segment. Is it so? >> >> On Wed, Jan 13, 2010 at 2:35 PM, Jason Rutherglen < >> jason.rutherglen@gmail.com> wrote: >> >>> There's a different method in LogMergePolicy that performs the >>> optimize... Right, so normal merging uses the findMerges method, then >>> there's a findMergeOptimize (method names could be inaccurate). >>> >>> On Wed, Jan 13, 2010 at 2:29 PM, Trin Chavalittumrong >>> wrote: >>> > Do you mean MergePolicy is only used during index time and will be >>> ignored >>> > by by the Optimize() process? >>> > >>> > >>> > On Wed, Jan 13, 2010 at 1:57 PM, Jason Rutherglen < >>> > jason.rutherglen@gmail.com> wrote: >>> > >>> >> Oh ok, you're asking about optimizing... I think that's a different >>> >> algorithm inside LogMergePolicy. =A0I think it ignores the maxMergeM= B >>> >> param. >>> >> >>> >> On Wed, Jan 13, 2010 at 1:49 PM, Trin Chavalittumrong >> > >>> >> wrote: >>> >> > Thanks, Jason. >>> >> > >>> >> > Is my understanding correct that >>> >> LogByteSizeMergePolicy.setMaxMergeMB(100) >>> >> > will prevent >>> >> > merging of two segments that is larger than 100 Mb each at the >>> optimizing >>> >> > time? >>> >> > >>> >> > If so, why do think would I still see segment that is larger than = 200 >>> MB? >>> >> > >>> >> > >>> >> > >>> >> > On Wed, Jan 13, 2010 at 1:43 PM, Jason Rutherglen < >>> >> > jason.rutherglen@gmail.com> wrote: >>> >> > >>> >> >> Hi Trin, >>> >> >> >>> >> >> There was recently a discussion about this, the max size is >>> >> >> for the before merge segments, rather than the resultant merged >>> >> >> segment (if that makes sense). It'd be great if we had a merge >>> >> >> policy that limited the resultant merged segment, though that'd >>> >> >> by a rough approximation at best. >>> >> >> >>> >> >> Jason >>> >> >> >>> >> >> On Wed, Jan 13, 2010 at 1:36 PM, Trin Chavalittumrong < >>> mrtrin@gmail.com >>> >> > >>> >> >> wrote: >>> >> >> > Hi, >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > I am trying to optimize the index which would merge different >>> segment >>> >> >> > together. Let say the index folder is 1Gb in total, I need each >>> >> >> segmentation >>> >> >> > to be no larger than 200Mb. I tried to use *LogByteSizeMergePol= icy >>> >> *and >>> >> >> > setMaxMergeMB(100) to ensure no segment after merging would be >>> 200Mb. >>> >> >> > However, I still see segment that are larger than 200Mb. I did = call >>> >> >> > IndexWriter.optimize(20) to make sure there are enough number >>> >> >> segmentation >>> >> >> > to allow each segment to be under 200Mb. >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > Can someone let me know if I am using this right? Or any sugges= tion >>> on >>> >> >> how >>> >> >> > to tackle this would be helpful. >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > Thanks, >>> >> >> > >>> >> >> > Trin >>> >> >> > >>> >> >> >>> >> >> -----------------------------------------------------------------= ---- >>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >> >>> >> >> >>> >> > >>> >> >>> >> --------------------------------------------------------------------= - >>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> >> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >>> >> >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org