lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject BlockJoin concerns
Date Fri, 14 Oct 2011 11:09:54 GMT
I've been looking at the BlockJoin stuff in 3.4 in relation to children of multiple types and
have a couple of concerns which are either issues, or my ignorance of the API:

Concern #1
========
If I only retrieve children of type A all is well.

If I only retrieve children of type B all is well.
If I try retrieve children of type A and then B I get a null TopGroups returned for B.
(test code for this at the end of this email)


Concern #2
========
I'm not sure where I get to control how many children of type A and of Type B are returned
per parent?
BlockJoinCollector's constructor only controls how many parents are collected.

*Post-search* I can call BlockJoinCollector'.getTopGroups(childQueryA,...maxDocsPerGroup..)
to define how many children I get back. Does this imply if I ask for more child docs than
are cached by the collector the search is somehow automatically repeated?
If so, what would be the "default" number of child docs cached by the collector and where
would I set that?

Cheers
Mark


Below is the code I added to the existing TestBlockJoin which exercises the above.

//=================================

 public void testMultiChildTypes() throws Exception {

    final Directory dir = newDirectory();
    final RandomIndexWriter w = new RandomIndexWriter(random, dir);

    final List<Document> docs = new ArrayList<Document>();

    docs.add(makeJob("java", 2007));
    docs.add(makeJob("python", 2010));
    docs.add(makeQualification("maths", 1999));
    docs.add(makeResume("Lisa", "United Kingdom"));
    w.addDocuments(docs);

    IndexReader r = w.getReader();
    w.close();
    IndexSearcher s = new IndexSearcher(r);

    // Create a filter that defines "parent" documents in the index - in this case resumes
    Filter parentsFilter = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new
Term("docType", "resume"))));

    // Define child document criteria (finds an example of relevant work experience)
    BooleanQuery childJobQuery = new BooleanQuery();
    childJobQuery.add(new BooleanClause(new TermQuery(new Term("skill", "java")), Occur.MUST));
    childJobQuery.add(new BooleanClause(NumericRangeQuery.newIntRange("year", 2006, 2011,
true, true), Occur.MUST));

    BooleanQuery childQualificationQuery = new BooleanQuery();
    childQualificationQuery.add(new BooleanClause(new TermQuery(new Term("qualification",
"maths")), Occur.MUST));
    childQualificationQuery.add(new BooleanClause(NumericRangeQuery.newIntRange("year",
1980, 2000, true, true), Occur.MUST));


    // Define parent document criteria (find a resident in the UK)
    Query parentQuery = new TermQuery(new Term("country", "United Kingdom"));

    // Wrap the child document query to 'join' any matches
    // up to corresponding parent:
    BlockJoinQuery childJobJoinQuery = new BlockJoinQuery(childJobQuery, parentsFilter,
BlockJoinQuery.ScoreMode.Avg);
    BlockJoinQuery childQualificationJoinQuery = new BlockJoinQuery(childQualificationQuery,
parentsFilter, BlockJoinQuery.ScoreMode.Avg);

    // Combine the parent and nested child queries into a single query for a candidate
    BooleanQuery fullQuery = new BooleanQuery();
    fullQuery.add(new BooleanClause(parentQuery, Occur.MUST));
    fullQuery.add(new BooleanClause(childJobJoinQuery, Occur.MUST));
    fullQuery.add(new BooleanClause(childQualificationJoinQuery, Occur.MUST));

    //????? How do I control volume of jobs vs qualifications per parent?
    BlockJoinCollector c = new BlockJoinCollector(Sort.RELEVANCE, 10, true, false);

    s.search(fullQuery, c);

    //Examine "Job" children
    boolean showNullPointerIssue=true;
    if(showNullPointerIssue)
    {
    TopGroups<Integer> jobResults = c.getTopGroups(childJobJoinQuery, null, 0, 10,
0, true);

    //assertEquals(1, results.totalHitCount);
    assertEquals(1, jobResults.totalGroupedHitCount);
    assertEquals(1, jobResults.groups.length);

    final GroupDocs<Integer> group = jobResults.groups[0];
    assertEquals(1, group.totalHits);

    Document childJobDoc = s.doc(group.scoreDocs[0].doc);
    //System.out.println("  doc=" + group.scoreDocs[0].doc);
    assertEquals("java", childJobDoc.get("skill"));
    assertNotNull(group.groupValue);
    Document parentDoc = s.doc(group.groupValue);
    assertEquals("Lisa", parentDoc.get("name"));
    }

    //Now Examine qualification children
    TopGroups<Integer> qualificationResults = c.getTopGroups(childQualificationJoinQuery,
null, 0, 10, 0, true);

    //!!!!! This next line can null pointer - but only if prior "jobs" section called first
    assertEquals(1, qualificationResults.totalGroupedHitCount);
    assertEquals(1, qualificationResults.groups.length);

    final GroupDocs<Integer> qGroup = qualificationResults.groups[0];
    assertEquals(1, qGroup.totalHits);

    Document childQualificationDoc = s.doc(qGroup.scoreDocs[0].doc);
    assertEquals("maths", childQualificationDoc.get("qualification"));
    assertNotNull(qGroup.groupValue);
    Document parentDoc = s.doc(qGroup.groupValue);
    assertEquals("Lisa", parentDoc.get("name"));


    r.close();
    dir.close();
  }


  // ... has multiple qualifications
  private Document makeQualification(String qualification, int year) {
    Document job = new Document();
    job.add(newField("qualification", qualification, Field.Store.YES, Field.Index.NOT_ANALYZED));
    job.add(new NumericField("year").setIntValue(year));
    return job;
  }

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message