lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: ToChildBlockJoinQuery question
Date Thu, 22 Jan 2015 16:45:52 GMT
I think the idea is that you create a blockjoinquery that encapsulates 
the join relation, and then you can create additional constraints in the 
result document space. In the case of ToChildBJQ, the result documents 
are child documents, so any additional query constraints will be applied 
to child documents.  For example, you could create the

ToChildBlockJoinQuery bjq = jamesBJQ();
TermQuery tq = new TermQuery (new Term("title", "doctor"));
BooleanQuery bq = new BooleanQuery (bjq, tq);

bq would then match books with parent (ie author) restrictions defined 
in jamesBJQ(), and child (ie book) restrictions defined by other queries 
like tq (title:doctor)

-Mike

On 1/22/15 11:27 AM, McKinley, James T wrote:
> Hi Greg,
>
> Thanks describing how block join queries were intended to work.  Your description makes
sense to me, however according to the API docs:
>
> http://lucene.apache.org/core/4_8_0/join/org/apache/lucene/search/join/ToChildBlockJoinQuery.html
>
> and particularly the naming of the parameters I don't think the API actually works as
you described:
>
> 	ToChildBlockJoinQuery(Query parentQuery, Filter parentsFilter, boolean doScores)
>
> If the filter was intended to filter the child docs I think it would be called childFilter
no?
>
> I think the use of the CachingWrappingFilter in the example I got from Mike McCandless'
blog post was the real cause of the exception I was seeing (maybe things have changed internally
since that post).  I finally noticed a mention of the FixedBitSetCachingWrapperFilter in the
description of the ToChildBlockJoinQuery constructor in the API docs.  When I changed to using
a filter produced by the FixedBitSetCachingWrapperFilter class the IllegalStateException no
longer occurs and I get the child docs using ToChildBlockJoinQuery with a parent doc filter
and parent doc query and results look correctly limited by the parent constraints.  For example:
>
> ...
> Gub-Gub's Book: An Encyclopedia of Food (Fictional work), Fictional work, 119320101
> 	by: Lofting, Hugh - NP, American, Writer
>
> The Story of Doctor Dolittle, Being the History of His Peculiar Life at Home and Astonishing
Adventures in Foreign Parts (Novel), Novel, 119200101
> 	by: Lofting, Hugh - NP, American, Writer
>
> The Voyages of Doctor Dolittle (Novel), Novel, 119220101
> 	by: Lofting, Hugh - NP, American, Writer
>
> The Story of Doctor Dolittle (Novel), Novel, 119200101
> 	by: Lofting, Hugh - NP, American, Writer
>
> ...
> Mister Beers (Poem), Poem, null
> 	by: Lofting, Hugh - NP, American, Writer
>
> The Twilight of Magic (Novel), Novel, 119300101
> 	by: Lofting, Hugh - NP, American, Writer
>
> Picnic (Lofting, Hugh) (Poem), Poem, null
> 	by: Lofting, Hugh - NP, American, Writer
>
> The Impossible Patriotism Project (Picture story), Picture story, 120070101
>
> A Skeleton in God's Closet: A Novel (Novel), Novel, 119940101
> 	by: Maier, Paul Luther - NP, American, null
>
> Pontius Pilate (Novel), Novel, 119680101
> 	by: Maier, Paul Luther - NP, American, null
>
> ...
> Josephus: The Essential Writings (Collection), Collection, 119880101
> 	by: Maier, Paul Luther - NP, American, null
>
> She Said the Geese (Poem), Poem, null
> 	by: Lifshin, Lyn - NP, American, Poet
>
> She Said She Could See Music (Poem), Poem, null
> 	by: Lifshin, Lyn - NP, American, Poet
> ...
>
> However I see no way to further limit the children as you describe.  If I use "a query
that matches the set of parents and a filter that matches the set of children" as you suggest
I get no results back.  I think your description of how it should work makes complete sense,
but that is not what I'm seeing when I try it.  Here's the code that produced the above output:
>
> 	private void runToChildBlockJoinQuery(String indexPath) throws IOException {
> 		FSDirectory dir = FSDirectory.open(new File(indexPath));
> 		IndexReader reader = DirectoryReader.open(dir);
> 		IndexSearcher searcher = new IndexSearcher(reader);
> 		
> 		TermQuery parentFilterQuery = new TermQuery(new Term("AGTY", "np"));
> 		BooleanQuery parentQuery = new BooleanQuery();
> 		parentQuery.add(new TermQuery(new Term("AGTY", "np")), Occur.MUST);
> 		parentQuery.add(new TermQuery(new Term("NT", "american")), Occur.MUST);
> 		
> 		Filter parentFilter = new FixedBitSetCachingWrapperFilter(new QueryWrapperFilter(parentFilterQuery));
>
> 		ToChildBlockJoinQuery tcbjq = new ToChildBlockJoinQuery(parentQuery, parentFilter,
true);
> 		
> 		TopDocs worksDocs = searcher.search(tcbjq, 5000);
> 		
> 		System.out.println("\n*ToChildBlockJoinQuery hit count = " + worksDocs.scoreDocs.length);
> 		displayWorks(reader, searcher, worksDocs);
> 	}
>
> 	private void displayWorks(IndexReader reader, IndexSearcher searcher, TopDocs worksDocs)
throws IOException {
> 		for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
> 			String agdn = reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
> 			String tw = reader.document(worksDocs.scoreDocs[i].doc).get("TW");
> 			String pd = reader.document(worksDocs.scoreDocs[i].doc).get("PD");
> 			String crid = reader.document(worksDocs.scoreDocs[i].doc).get("CRID");
> 			TopDocs creatorDocs = searcher.search(new TermQuery(new Term("ABID", crid)), Integer.MAX_VALUE);
> 			System.out.println("\n" + agdn + ", " + tw + ", " + pd);
> 			displayCreators(reader, searcher, creatorDocs);
> 		}
> 	}
>
> 	private void displayCreators(IndexReader reader, IndexSearcher searcher, TopDocs worksDocs)
throws IOException {
> 		for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
> 			String agdn = reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
> 			String agty = reader.document(worksDocs.scoreDocs[i].doc).get("AGTY");
> 			String nt = reader.document(worksDocs.scoreDocs[i].doc).get("NT");
> 			String poc = reader.document(worksDocs.scoreDocs[i].doc).get("POC");
> 			System.out.println("\tby: " + agdn + " - " + agty + ", " +nt + ", " + poc);
> 		}
> 	}
>
> When I try to use ToParentBlockJoinQuery I don't get any results either and it is not
what I really want anyway, I want the child documents limited by the parent documents.
>
> ToChildBlockJoinQuery almost gives me what I want, but I really need to be able to filter
the child docs returned as well as the parent from which they came.  If you (or anybody) still
thinks I'm doing it wrong please let me know.  If I should file a bug report also let me know
that, I have a small index I can provide if it is useful.  Thanks again for your help.
>
> Jim
>
> ________________________________________
> From: Gregory Dearing [gregdearing@gmail.com]
> Sent: Wednesday, January 21, 2015 6:59 PM
> To: java-user@lucene.apache.org
> Subject: Re: ToChildBlockJoinQuery question
>
> Jim,
>
> I think you hit the nail on the head... that's not what BlockJoinQueries do.
>
> If you're wanting to search for children and join to their parents... then
> use ToParentBlockJoinQuery, with a query that matches the set of children
> and a filter that matches the set of parents.
>
> If you're searching for parents, then joining to their children... then use
> ToChildBlockJoinQuery, with a query that matches the set of parents and a
> filter that matches the set of children.
>
> When you add related documents to the index (via addDocuments), make that
> children are added before their parents.
>
> The reason all the above is necessary is that it makes it possible to have
> a nested hierarchy of relationships (ie. Parents have Children, which have
> Children of their own).  You need a query to indicate which part of the
> hierarchy you're starting from, and a filter indicating which part of the
> hierarchy you're joining to.
>
> Also, you will always get an exception if your query and your filter both
> match the same document.  A child can't be its own parent.
>
> BlockJoin is a very powerful feature, but what it's really doing is
> modelling relationships using an index that doesn't know what a
> relationship is.  The relationships are determined by a combination of the
> order that you indexed the block, and the format of your query.  This
> disjoin can lead to some weird behavior if you're not absolutely sure how
> it works.
>
> Thanks,
> Greg
>
>
>
>
>
> On Wed, Jan 21, 2015 at 4:34 PM, McKinley, James T <
> james.mckinley@cengage.com> wrote:
>
>> Am I understanding how this is supposed to work?  What I think I am (and
>> should be) doing is providing a query and filter that specifies the parent
>> docs and the ToChildBlockJoinQuery should return me all the child docs for
>> the resulting parent docs.  Is this correct?  The reason I think I'm not
>> understanding is that I don't see why I need both a filter and a query to
>> specify the parent docs when a single query or filter should suffice.  Am I
>> misunderstanding what parentQuery and parentFilter mean, they both refer to
>> parent docs right?
>>
>> Jim
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message