Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4F2A5705F for ; Fri, 14 Oct 2011 13:11:01 +0000 (UTC) Received: (qmail 80405 invoked by uid 500); 14 Oct 2011 13:11:00 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 80359 invoked by uid 500); 14 Oct 2011 13:11:00 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 80352 invoked by uid 99); 14 Oct 2011 13:11:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Oct 2011 13:11:00 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=FREEMAIL_FROM,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.82.109.205] (HELO nm25-vm4.bullet.mail.ird.yahoo.com) (212.82.109.205) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 14 Oct 2011 13:10:52 +0000 Received: from [77.238.189.51] by nm25.bullet.mail.ird.yahoo.com with NNFMP; 14 Oct 2011 13:10:29 -0000 Received: from [212.82.108.244] by tm4.bullet.mail.ird.yahoo.com with NNFMP; 14 Oct 2011 13:10:29 -0000 Received: from [127.0.0.1] by omp1009.mail.ird.yahoo.com with NNFMP; 14 Oct 2011 13:10:29 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 800366.29711.bm@omp1009.mail.ird.yahoo.com Received: (qmail 36506 invoked by uid 60001); 14 Oct 2011 13:10:29 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.uk; s=s1024; t=1318597829; bh=lXWetK2O17Irk0IyYmq8Y15kjYFqa7oglQZE3iNvpuM=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=Ry9ZkSn3bxnvXZzDGU4OpghrGvybrVsm2vGALEZ1Cg/qeIbqbRRChwFqoPiFrnLfDGCXNPhaI7bBt+4e3ILmm5HHjGMLxBM4shXH6YCIELjfwkiNnUeYKtRmtR2L19xukaGIGR1Vi/aiMnybrmKE7W8RfkncPocQRu2++g7+pa0= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=Z4kFGMOLKUv40mRMedAWDmbXOc7W3q5qLy6aFqbaZtzVzCy0T5sqMvYFaLiES8fWehyVdqXABXOevw4GgFbrhBYd0oRmsC8dAugbnJK3Nf2FtwdyxDlQkkDzd57IcAmVpaY2w6zXb/pKmguolxlxU1XOMsXwuAvrOENW4P4BAnI=; X-YMail-OSG: U5kQZZ4VM1kGXk3LNc2XsjHu7yORfL5atOP2vF3Ohf4lQ.g RCMAv3xqXo1kwOre4s.0WWD_oD9rVRqarI9MPToe9rQtzmCHas4v4W0vroRr EBXgpQe.EBYs6vYT5aXaR.zRORncdLYfmBE.mYHc.ZZo0qKmrUOy8zmAdqqx gMj0jmkI8gtD5_dskQYEpC0uBjv7ePkM_Dz7QH2oHLKx2AluBkkRq1O_yOQ0 lvfwQiwbKWwuw6NH05tbhwTVddJgjGDb2BTKgILCabOaNbNICYmB0V6H9FvU qX.Bq8GlTiOIpG9c9iVGq82g_V1IuRYzaiseGnE4fEvbOYV3v0kcF6bzL14v KJG8Q1wpN99lTijtCbq.NsI6EWQeOeF5LDoD4WEc6riyZg4lsUic7W.Cx9Pe K99.E0qDtDL0lvGiN1T82kC8cs_eAA5TIG91nDOhTqlg8r4.dzm2Z3q7qlAF jqGRGDO8t9alPU0LYpEM_50KFwJ70t8rnibHRzHqVPyqgH674Dfg1T8lF2vw f5GSktw-- Received: from [194.116.198.179] by web29008.mail.ird.yahoo.com via HTTP; Fri, 14 Oct 2011 14:10:28 BST X-Mailer: YahooMailWebService/0.8.114.317681 References: <1318590594.69358.YahooMailNeo@web29012.mail.ird.yahoo.com> Message-ID: <1318597828.36387.YahooMailNeo@web29008.mail.ird.yahoo.com> Date: Fri, 14 Oct 2011 14:10:28 +0100 (BST) From: mark harwood Reply-To: mark harwood Subject: Re: BlockJoin concerns To: "dev@lucene.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable >>I opened LUCENE-3519 for the unexpected null when pulling the=0A>>TopGrou= ps, and added your test case (thanks!).=0A=0A=0AGreat, thanks.=A0=0A=0A>>th= e collector internally=A0gathers all child docIDs for a given collected par= ent docID=0A=0AOK - I guess that scales OK because the numbers of docIDs pe= r parent is naturally limited by the number of docs you can hold in RAM as = part of the original IW.addDocuments call - i.e. not in the millions.=0A=0A= Cheers,=0AMark=0A=0A=0A=0A----- Original Message -----=0AFrom: Michael McCa= ndless =0ATo: dev@lucene.apache.org; mark harwoo= d =0ACc: =0ASent: Friday, 14 October 2011, 13:56= =0ASubject: Re: BlockJoin concerns=0A=0AHi Mark,=0A=0AI opened LUCENE-3519 = for the unexpected null when pulling the=0ATopGroups, and added your test c= ase (thanks!).=0A=0AOn Concern #2, this is not limited today: the collector= internally=0Agathers all child docIDs for a given collected parent docID, = and only=0Ain the end when ask for the top groups does it sort the child do= cIDs=0Awithin each group and keep the topN you passed to it.=0A=0AMike McCa= ndless=0A=0Ahttp://blog.mikemccandless.com=0A=0AOn Fri, Oct 14, 2011 at 7:0= 9 AM, mark harwood wrote:=0A> I've been looking a= t the BlockJoin stuff in 3.4 in relation to children of multiple types and = have a couple of concerns which are either issues, or my ignorance of the A= PI:=0A>=0A> Concern #1=0A> =3D=3D=3D=3D=3D=3D=3D=3D=0A> If I only retrieve = children of type A all is well.=0A>=0A> If I only retrieve children of type= B all is well.=0A> If I try retrieve children of type A and then B I get a= null TopGroups returned for B.=0A> (test code for this at the end of this = email)=0A>=0A>=0A> Concern #2=0A> =3D=3D=3D=3D=3D=3D=3D=3D=0A> I'm not sure= where I get to control how many children of type A and of Type B are retur= ned per parent?=0A> BlockJoinCollector's constructor only controls how many= parents are collected.=0A>=0A> *Post-search* I can call=A0BlockJoinCollect= or'.getTopGroups(childQueryA,...maxDocsPerGroup..) to define how many child= ren I get back.=A0Does this imply if I ask for more child docs than are cac= hed by the collector the search is somehow automatically repeated?=0A> If s= o, what would be the "default" number of child docs cached by the collector= and where would I set that?=0A>=0A> Cheers=0A> Mark=0A>=0A>=0A> Below is t= he code I added to the existing TestBlockJoin which exercises the above.=0A= >=0A> //=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A>=0A> =A0public void testMultiChildType= s() throws Exception {=0A>=0A> =A0=A0 =A0final Directory dir =3D newDirecto= ry();=0A> =A0=A0 =A0final RandomIndexWriter w =3D new RandomIndexWriter(ran= dom, dir);=0A>=0A> =A0=A0 =A0final List docs =3D new ArrayList();=0A>=0A> =A0=A0 =A0docs.add(makeJob("java", 2007));=0A> =A0=A0 = =A0docs.add(makeJob("python", 2010));=0A> =A0=A0 =A0docs.add(makeQualificat= ion("maths", 1999));=0A> =A0=A0 =A0docs.add(makeResume("Lisa", "United King= dom"));=0A> =A0=A0 =A0w.addDocuments(docs);=0A>=0A> =A0=A0 =A0IndexReader r= =3D w.getReader();=0A> =A0=A0 =A0w.close();=0A> =A0=A0 =A0IndexSearcher s = =3D new IndexSearcher(r);=0A>=0A> =A0=A0 =A0// Create a filter that defines= "parent" documents in the index - in this case resumes=0A> =A0=A0 =A0Filte= r parentsFilter =3D new CachingWrapperFilter(new QueryWrapperFilter(new Ter= mQuery(new Term("docType", "resume"))));=0A>=0A> =A0=A0 =A0// Define child = document criteria (finds an example of relevant work experience)=0A> =A0=A0= =A0BooleanQuery childJobQuery =3D new BooleanQuery();=0A> =A0=A0 =A0childJ= obQuery.add(new BooleanClause(new TermQuery(new Term("skill", "java")), Occ= ur.MUST));=0A> =A0=A0 =A0childJobQuery.add(new BooleanClause(NumericRangeQu= ery.newIntRange("year", 2006, 2011, true, true), Occur.MUST));=0A>=0A> =A0= =A0 =A0BooleanQuery childQualificationQuery =3D new BooleanQuery();=0A> =A0= =A0 =A0childQualificationQuery.add(new BooleanClause(new TermQuery(new Term= ("qualification", "maths")), Occur.MUST));=0A> =A0=A0 =A0childQualification= Query.add(new BooleanClause(NumericRangeQuery.newIntRange("year", 1980, 200= 0, true, true), Occur.MUST));=0A>=0A>=0A> =A0=A0 =A0// Define parent docume= nt criteria (find a resident in the UK)=0A> =A0=A0 =A0Query parentQuery =3D= new TermQuery(new Term("country", "United Kingdom"));=0A>=0A> =A0=A0 =A0//= Wrap the child document query to 'join' any matches=0A> =A0=A0 =A0// up to= corresponding parent:=0A> =A0=A0 =A0BlockJoinQuery childJobJoinQuery =3D n= ew BlockJoinQuery(childJobQuery, parentsFilter, BlockJoinQuery.ScoreMode.Av= g);=0A> =A0=A0 =A0BlockJoinQuery childQualificationJoinQuery =3D new BlockJ= oinQuery(childQualificationQuery, parentsFilter, BlockJoinQuery.ScoreMode.A= vg);=0A>=0A> =A0=A0 =A0// Combine the parent and nested child queries into = a single query for a candidate=0A> =A0=A0 =A0BooleanQuery fullQuery =3D new= BooleanQuery();=0A> =A0=A0 =A0fullQuery.add(new BooleanClause(parentQuery,= Occur.MUST));=0A> =A0=A0 =A0fullQuery.add(new BooleanClause(childJobJoinQu= ery, Occur.MUST));=0A> =A0=A0 =A0fullQuery.add(new BooleanClause(childQuali= ficationJoinQuery, Occur.MUST));=0A>=0A> =A0=A0 =A0//????? How do I control= volume of jobs vs qualifications per parent?=0A> =A0=A0 =A0BlockJoinCollec= tor c =3D new BlockJoinCollector(Sort.RELEVANCE, 10, true, false);=0A>=0A> = =A0=A0 =A0s.search(fullQuery, c);=0A>=0A> =A0=A0 =A0//Examine "Job" childre= n=0A> =A0=A0 =A0boolean showNullPointerIssue=3Dtrue;=0A> =A0=A0 =A0if(showN= ullPointerIssue)=0A> =A0=A0 =A0{=0A> =A0=A0 =A0TopGroups jobResult= s =3D c.getTopGroups(childJobJoinQuery, null, 0, 10, 0, true);=0A>=0A> =A0= =A0 =A0//assertEquals(1, results.totalHitCount);=0A> =A0=A0 =A0assertEquals= (1, jobResults.totalGroupedHitCount);=0A> =A0=A0 =A0assertEquals(1, jobResu= lts.groups.length);=0A>=0A> =A0=A0 =A0final GroupDocs group =3D jo= bResults.groups[0];=0A> =A0=A0 =A0assertEquals(1, group.totalHits);=0A>=0A>= =A0=A0 =A0Document childJobDoc =3D s.doc(group.scoreDocs[0].doc);=0A> =A0= =A0 =A0//System.out.println(" =A0doc=3D" + group.scoreDocs[0].doc);=0A> =A0= =A0 =A0assertEquals("java", childJobDoc.get("skill"));=0A> =A0=A0 =A0assert= NotNull(group.groupValue);=0A> =A0=A0 =A0Document parentDoc =3D s.doc(group= .groupValue);=0A> =A0=A0 =A0assertEquals("Lisa", parentDoc.get("name"));=0A= > =A0=A0 =A0}=0A>=0A> =A0=A0 =A0//Now Examine qualification children=0A> = =A0=A0 =A0TopGroups qualificationResults =3D c.getTopGroups(childQ= ualificationJoinQuery, null, 0, 10, 0, true);=0A>=0A> =A0=A0 =A0//!!!!! Thi= s next line can null pointer - but only if prior "jobs" section called firs= t=0A> =A0=A0 =A0assertEquals(1, qualificationResults.totalGroupedHitCount);= =0A> =A0=A0 =A0assertEquals(1, qualificationResults.groups.length);=0A>=0A>= =A0=A0 =A0final GroupDocs qGroup =3D qualificationResults.groups[= 0];=0A> =A0=A0 =A0assertEquals(1, qGroup.totalHits);=0A>=0A> =A0=A0 =A0Docu= ment childQualificationDoc =3D s.doc(qGroup.scoreDocs[0].doc);=0A> =A0=A0 = =A0assertEquals("maths", childQualificationDoc.get("qualification"));=0A> = =A0=A0 =A0assertNotNull(qGroup.groupValue);=0A> =A0=A0 =A0Document parentDo= c =3D s.doc(qGroup.groupValue);=0A> =A0=A0 =A0assertEquals("Lisa", parentDo= c.get("name"));=0A>=0A>=0A> =A0=A0 =A0r.close();=0A> =A0=A0 =A0dir.close();= =0A> =A0=A0}=0A>=0A>=0A> =A0 // ... has multiple qualifications=0A> =A0 pri= vate Document makeQualification(String qualification, int year) {=0A> =A0 = =A0 Document job =3D new Document();=0A> =A0 =A0 job.add(newField("qualific= ation", qualification, Field.Store.YES, Field.Index.NOT_ANALYZED));=0A> =A0= =A0 job.add(new NumericField("year").setIntValue(year));=0A> =A0 =A0 retur= n job;=0A> =A0 }=0A>=0A> --------------------------------------------------= -------------------=0A> To unsubscribe, e-mail: dev-unsubscribe@lucene.apac= he.org=0A> For additional commands, e-mail: dev-help@lucene.apache.org=0A>= =0A>=0A=0A-----------------------------------------------------------------= ----=0ATo unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org=0AFor addi= tional commands, e-mail: dev-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org