From dev-return-60543-apmail-lucene-dev-archive=lucene.apache.org@lucene.apache.org Thu Dec 02 09:43:39 2010 Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 34207 invoked from network); 2 Dec 2010 09:43:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Dec 2010 09:43:39 -0000 Received: (qmail 71969 invoked by uid 500); 2 Dec 2010 09:43:38 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 71681 invoked by uid 500); 2 Dec 2010 09:43:37 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 71673 invoked by uid 99); 2 Dec 2010 09:43:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 09:43:36 +0000 X-ASF-Spam-Status: No, hits=2.1 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLYTO,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of simon.willnauer@googlemail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 09:43:32 +0000 Received: by qwj9 with SMTP id 9so2645415qwj.35 for ; Thu, 02 Dec 2010 01:43:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=sZpFpufKWqfHQ7sIU3yAElWCTvjg8Ta+DAlRRjym6wQ=; b=YJaKuAgV3eGwqG2VjzxPS5gqYt1FDuOXyjMwdAwlFgLhjoLsmEuh61FFnKZSxNKmvE Coepbilwt7g0aFdDEaoOoR+9DNpIBzM8/0CPfvRRmJ7pmIR1pZYmdej99AqAPBehkc8S 2ld3QxM/gCd/qOIr9x1kxeAFAmsOrRl7JJTsQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; b=AvFc1YBIxtwFPtxLnw0y+RlDPyu46kO1qo41B+nSV7A/MiWsCol/Me3VJ24ubZ5Qpx /hJk9viG1o8qUD4nnmx13Q/XUZBw26dRqq+lUXS4mH6RRtjO6MeNfh/M4Lzv7TKjYYWe pMEflk68taDI+9MtDx0kYXJ1BdavviENQe0uw= MIME-Version: 1.0 Received: by 10.229.238.82 with SMTP id kr18mr8298602qcb.242.1291282991157; Thu, 02 Dec 2010 01:43:11 -0800 (PST) Received: by 10.229.100.139 with HTTP; Thu, 2 Dec 2010 01:43:11 -0800 (PST) Reply-To: simon.willnauer@gmail.com In-Reply-To: References: Date: Thu, 2 Dec 2010 10:43:11 +0100 Message-ID: Subject: Re: Consolidate MP and LMP From: Simon Willnauer To: dev@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Shai, I always have trouble to figure our what you are talking about in the first place since you are never introducing the abbreviations you are using :) ->> Consolidate MP and LMP (WTF) :) On Thu, Dec 2, 2010 at 10:33 AM, Michael McCandless wrote: > Hmm... but MegePolicy is our abstract base class, and LogMergePolicy > adds alot of concrete stuff, ie choosing merges according to "level" > so that you get an exponential staircase of segments in your index. ah MergePolicy :) > > Conceptually it seems like they are separate? =C2=A0Like if another merge > policy came along that had the freedom to pick variable-mergeFactor > segments to merge at once... shouldn't it subclass MP? =C2=A0Or, say we f= ix > IW to allow out-of-order merges, shouldn't that also subclass MP and > not LMP? I agree we should stick to that but requiring LMP in IW should be fixed really. I actually see no reason why we should rely on LogMergePolicy in IW and it should be easy to move the required methods up to the MergePolicy.. > > I do agree IW requires LMP in some places, but isn't this limited to > certain methods, eg set/getUseCompoundFile? =C2=A0Maybe we can move just > these methods up? +1 During the work on Column Stride Fields I was actually thinking that Compound vs. Non-Compound should not be a global decision since we now have codecs and each codec should use its own way of writing files. Maybe it would make things way easier if we expose CFS to codecs and let them decide what to do. I can imagine that I want to use CFS for some of the codecs like Column Stride or fields that are not used for searches but keep individual files per codec. Just an idea.... Simon > > Mike > > On Thu, Dec 2, 2010 at 4:25 AM, Shai Erera wrote: >> Hi >> >> While IndexWriter declares it accepts a general MP, it will actually fai= l if >> the given instance is not LogMP. So I wonder if we shouldn't consolidate >> both of them into one, and pull up all of LMP features to MP. I think al= l of >> LMP's features are useful for any kind of MP, and if someone wants to ig= nore >> them he still can. >> >> This is not the sort of change that fits well in trunk. IMO it can fit w= ell >> in 3x too since IW didn't accept anything that is not LMP. So even if it >> will appear we're breaking back-compat, we actually won't. Which is anot= her >> reason, for me, why those two should be consolidated. >> >> What do you think? >> >> Shai >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org