lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: BlockJoin concerns
Date Fri, 14 Oct 2011 12:56:48 GMT
Hi Mark,

I opened LUCENE-3519 for the unexpected null when pulling the
TopGroups, and added your test case (thanks!).

On Concern #2, this is not limited today: the collector internally
gathers all child docIDs for a given collected parent docID, and only
in the end when ask for the top groups does it sort the child docIDs
within each group and keep the topN you passed to it.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Oct 14, 2011 at 7:09 AM, mark harwood <markharw00d@yahoo.co.uk> wrote:
> I've been looking at the BlockJoin stuff in 3.4 in relation to children of multiple types
and have a couple of concerns which are either issues, or my ignorance of the API:
>
> Concern #1
> ========
> If I only retrieve children of type A all is well.
>
> If I only retrieve children of type B all is well.
> If I try retrieve children of type A and then B I get a null TopGroups returned for B.
> (test code for this at the end of this email)
>
>
> Concern #2
> ========
> I'm not sure where I get to control how many children of type A and of Type B are returned
per parent?
> BlockJoinCollector's constructor only controls how many parents are collected.
>
> *Post-search* I can call BlockJoinCollector'.getTopGroups(childQueryA,...maxDocsPerGroup..)
to define how many children I get back. Does this imply if I ask for more child docs than
are cached by the collector the search is somehow automatically repeated?
> If so, what would be the "default" number of child docs cached by the collector and where
would I set that?
>
> Cheers
> Mark
>
>
> Below is the code I added to the existing TestBlockJoin which exercises the above.
>
> //=================================
>
>  public void testMultiChildTypes() throws Exception {
>
>     final Directory dir = newDirectory();
>     final RandomIndexWriter w = new RandomIndexWriter(random, dir);
>
>     final List<Document> docs = new ArrayList<Document>();
>
>     docs.add(makeJob("java", 2007));
>     docs.add(makeJob("python", 2010));
>     docs.add(makeQualification("maths", 1999));
>     docs.add(makeResume("Lisa", "United Kingdom"));
>     w.addDocuments(docs);
>
>     IndexReader r = w.getReader();
>     w.close();
>     IndexSearcher s = new IndexSearcher(r);
>
>     // Create a filter that defines "parent" documents in the index - in this case
resumes
>     Filter parentsFilter = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new
Term("docType", "resume"))));
>
>     // Define child document criteria (finds an example of relevant work experience)
>     BooleanQuery childJobQuery = new BooleanQuery();
>     childJobQuery.add(new BooleanClause(new TermQuery(new Term("skill", "java")),
Occur.MUST));
>     childJobQuery.add(new BooleanClause(NumericRangeQuery.newIntRange("year", 2006,
2011, true, true), Occur.MUST));
>
>     BooleanQuery childQualificationQuery = new BooleanQuery();
>     childQualificationQuery.add(new BooleanClause(new TermQuery(new Term("qualification",
"maths")), Occur.MUST));
>     childQualificationQuery.add(new BooleanClause(NumericRangeQuery.newIntRange("year",
1980, 2000, true, true), Occur.MUST));
>
>
>     // Define parent document criteria (find a resident in the UK)
>     Query parentQuery = new TermQuery(new Term("country", "United Kingdom"));
>
>     // Wrap the child document query to 'join' any matches
>     // up to corresponding parent:
>     BlockJoinQuery childJobJoinQuery = new BlockJoinQuery(childJobQuery, parentsFilter,
BlockJoinQuery.ScoreMode.Avg);
>     BlockJoinQuery childQualificationJoinQuery = new BlockJoinQuery(childQualificationQuery,
parentsFilter, BlockJoinQuery.ScoreMode.Avg);
>
>     // Combine the parent and nested child queries into a single query for a candidate
>     BooleanQuery fullQuery = new BooleanQuery();
>     fullQuery.add(new BooleanClause(parentQuery, Occur.MUST));
>     fullQuery.add(new BooleanClause(childJobJoinQuery, Occur.MUST));
>     fullQuery.add(new BooleanClause(childQualificationJoinQuery, Occur.MUST));
>
>     //????? How do I control volume of jobs vs qualifications per parent?
>     BlockJoinCollector c = new BlockJoinCollector(Sort.RELEVANCE, 10, true, false);
>
>     s.search(fullQuery, c);
>
>     //Examine "Job" children
>     boolean showNullPointerIssue=true;
>     if(showNullPointerIssue)
>     {
>     TopGroups<Integer> jobResults = c.getTopGroups(childJobJoinQuery, null,
0, 10, 0, true);
>
>     //assertEquals(1, results.totalHitCount);
>     assertEquals(1, jobResults.totalGroupedHitCount);
>     assertEquals(1, jobResults.groups.length);
>
>     final GroupDocs<Integer> group = jobResults.groups[0];
>     assertEquals(1, group.totalHits);
>
>     Document childJobDoc = s.doc(group.scoreDocs[0].doc);
>     //System.out.println("  doc=" + group.scoreDocs[0].doc);
>     assertEquals("java", childJobDoc.get("skill"));
>     assertNotNull(group.groupValue);
>     Document parentDoc = s.doc(group.groupValue);
>     assertEquals("Lisa", parentDoc.get("name"));
>     }
>
>     //Now Examine qualification children
>     TopGroups<Integer> qualificationResults = c.getTopGroups(childQualificationJoinQuery,
null, 0, 10, 0, true);
>
>     //!!!!! This next line can null pointer - but only if prior "jobs" section called
first
>     assertEquals(1, qualificationResults.totalGroupedHitCount);
>     assertEquals(1, qualificationResults.groups.length);
>
>     final GroupDocs<Integer> qGroup = qualificationResults.groups[0];
>     assertEquals(1, qGroup.totalHits);
>
>     Document childQualificationDoc = s.doc(qGroup.scoreDocs[0].doc);
>     assertEquals("maths", childQualificationDoc.get("qualification"));
>     assertNotNull(qGroup.groupValue);
>     Document parentDoc = s.doc(qGroup.groupValue);
>     assertEquals("Lisa", parentDoc.get("name"));
>
>
>     r.close();
>     dir.close();
>   }
>
>
>   // ... has multiple qualifications
>   private Document makeQualification(String qualification, int year) {
>     Document job = new Document();
>     job.add(newField("qualification", qualification, Field.Store.YES, Field.Index.NOT_ANALYZED));
>     job.add(new NumericField("year").setIntValue(year));
>     return job;
>   }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message