Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 49702 invoked from network); 2 Dec 2010 10:21:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Dec 2010 10:21:37 -0000 Received: (qmail 19362 invoked by uid 500); 2 Dec 2010 10:21:36 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 19073 invoked by uid 500); 2 Dec 2010 10:21:35 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 19060 invoked by uid 99); 2 Dec 2010 10:21:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 10:21:35 +0000 X-ASF-Spam-Status: No, hits=4.0 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of serera@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 10:21:28 +0000 Received: by pwj9 with SMTP id 9so1787722pwj.35 for ; Thu, 02 Dec 2010 02:21:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=g/G8eNGZVHi64jLDxzpmDTWROWOPEVKsQwNkuYGhD3w=; b=IfXnmsmeMX/LQHrzMsP/GGBgYPb5nyDDyRATegBvmI3TqQkZBma0V6aXVKP4h8o215 NLjs0CfaAgnApsKe7rGOOamQW+m2i/+ug6+ZpTKEnbt6pIsr76yeCSh25F78oonJ5dJP 23GjAej8co7G9AIJTvKhVJB9WaW5Bib/qGFoQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=e24H4J0cX2SNEgxubffzCqwFZHd8xYXDzl/iqHyzBJEwRTm8TLDsPZXqXhe/YUJ81C tXSxtfervBh+SeIC4vliBIca2s7yytTpsi4aXbu6CmXPuhTvG0k110SzyP44aC2Uk0HY XRbiiEkyI9gpC37eyDdP8JR6iXImoZIVAsCR4= MIME-Version: 1.0 Received: by 10.142.225.19 with SMTP id x19mr10242wfg.5.1291285268143; Thu, 02 Dec 2010 02:21:08 -0800 (PST) Received: by 10.142.204.7 with HTTP; Thu, 2 Dec 2010 02:21:08 -0800 (PST) In-Reply-To: References: Date: Thu, 2 Dec 2010 12:21:08 +0200 Message-ID: Subject: Re: Consolidate MP and LMP From: Shai Erera To: dev@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd29c82b61e8a04966ac72a --000e0cd29c82b61e8a04966ac72a Content-Type: text/plain; charset=KOI8-U Content-Transfer-Encoding: quoted-printable Simon, sorry about that: MP =3D MergePolicy, LMP =3D LogMergePolicy. WTF = =3D ... well I know what that is :-). In 3x, IW.getLogMergePolicy which is called by some public (albeit deprecated) API throws IllegalArgumentException if MP is not LMP. In trunk this API is gone now (thanks to Earwin?), so it's less of a problem. Still, IW queries instanceof LMP just to know whether it should create a compound file, which is a setting unrelated to LMP. If I write my own MP, does it mean IW will never create compound segments? Hmm .. now that I look closely at it, MP has useCompundFile/DocStore methods, and LMP just adds getUseCompoundFile(). Why? And IndexWriter.addIndexes(IndexReader...) queries instanceof LMP, instead of calling mp.useCompoundFile()? So perhaps we should: 1) Fix IW to not case to LMP just to ask if it should create compound files or not. And then we can perhaps remove getLogMergePolicy from IW on 3x, and also removing the source for confusion. 2) Look at LMP and decide if there are method we believe can be placed on a general MP, such as mergeFactor or maxMergeDocs. LogMP is special in how it picks segments for merge - that is, log-based (levels). But maxMergeDocs, maxMergeSize, mergeFactor, are unrelated to log/levels. This is the sort of functionality I'd expect to find on a general MP impl. Shai On Thu, Dec 2, 2010 at 11:44 AM, Earwin Burrfoot wrote: > Actually, in trunk IW doesn't break on anything else. > > There's one private no-longer-used method I forgot to delete on my > drop-all-deprecations spree. > And there's a block in addIndexes, that explicitly checks instanceof, > and only then casts to LMP. > > I'm against consolidating MP and LMP. MP is a damn interface! > We should strive to make things less coupled rather than other way around= . > > On Thu, Dec 2, 2010 at 12:25, Shai Erera wrote: > > Hi > > > > While IndexWriter declares it accepts a general MP, it will actually fa= il > if > > the given instance is not LogMP. So I wonder if we shouldn't consolidat= e > > both of them into one, and pull up all of LMP features to MP. I think a= ll > of > > LMP's features are useful for any kind of MP, and if someone wants to > ignore > > them he still can. > > > > This is not the sort of change that fits well in trunk. IMO it can fit > well > > in 3x too since IW didn't accept anything that is not LMP. So even if i= t > > will appear we're breaking back-compat, we actually won't. Which is > another > > reason, for me, why those two should be consolidated. > > > > What do you think? > > > > Shai > > > > > > -- > Kirill Zakharenko/=EB=C9=D2=C9=CC=CC =FA=C1=C8=C1=D2=C5=CE=CB=CF (earwin@= gmail.com) > Phone: +7 (495) 683-567-4 > ICQ: 104465785 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > --000e0cd29c82b61e8a04966ac72a Content-Type: text/html; charset=KOI8-U Content-Transfer-Encoding: quoted-printable
Simon, sorry about that: MP =3D MergePolicy, LMP =3D LogMe= rgePolicy. WTF =3D ... well I know what that is :-).

In 3x, IW.getLo= gMergePolicy which is called by some public (albeit deprecated) API throws = IllegalArgumentException if MP is not LMP. In trunk this API is gone now (t= hanks to Earwin?), so it's less of a problem. Still, IW queries instanc= eof LMP just to know whether it should create a compound file, which is a s= etting unrelated to LMP. If I write my own MP, does it mean IW will never c= reate compound segments?

Hmm .. now that I look closely at it, MP has useCompundFile/DocStore me= thods, and LMP just adds getUseCompoundFile(). Why?
And IndexWriter.addI= ndexes(IndexReader...) queries instanceof LMP, instead of calling mp.useCom= poundFile()?

So perhaps we should:

1) Fix IW to not case to LMP just to ask i= f it should create compound files or not. And then we can perhaps remove ge= tLogMergePolicy from IW on 3x, and also removing the source for confusion.<= br>
2) Look at LMP and decide if there are method we believe can be placed = on a general MP, such as mergeFactor or maxMergeDocs. LogMP is special in h= ow it picks segments for merge - that is, log-based (levels). But maxMergeD= ocs, maxMergeSize, mergeFactor, are unrelated to log/levels. This is the so= rt of functionality I'd expect to find on a general MP impl.

Shai

On Thu, Dec 2, 2010 at 11:44 AM,= Earwin Burrfoot <= earwin@gmail.com> wrote:
Actually, in trunk IW doesn't break on anything else.

There's one private no-longer-used method I forgot to delete on my
drop-all-deprecations spree.
And there's a block in addIndexes, that explicitly checks instanceof, and only then casts to LMP.

I'm against consolidating MP and LMP. MP is a damn interface!
We should strive to make things less coupled rather than other way around.<= br>

On Thu, Dec 2, 2010 at 12:25, Shai Erera <serera@gmail.com> wrote:
> Hi
>
> While IndexWriter declares it accepts a general MP, it will actually f= ail if
> the given instance is not LogMP. So I wonder if we shouldn't conso= lidate
> both of them into one, and pull up all of LMP features to MP. I think = all of
> LMP's features are useful for any kind of MP, and if someone wants= to ignore
> them he still can.
>
> This is not the sort of change that fits well in trunk. IMO it can fit= well
> in 3x too since IW didn't accept anything that is not LMP. So even= if it
> will appear we're breaking back-compat, we actually won't. Whi= ch is another
> reason, for me, why those two should be consolidated.
>
> What do you think?
>
> Shai
>



--
Kirill Zakharenko/=EB=C9=D2=C9=CC=CC =FA=C1=C8=C1=D2=C5=CE=CB=CF (earwin@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


--000e0cd29c82b61e8a04966ac72a--