lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "McKinley, James T" <james.mckin...@cengage.com>
Subject RE: ToChildBlockJoinQuery question
Date Thu, 22 Jan 2015 16:27:08 GMT
Hi Greg,

Thanks describing how block join queries were intended to work.  Your description makes sense
to me, however according to the API docs:

http://lucene.apache.org/core/4_8_0/join/org/apache/lucene/search/join/ToChildBlockJoinQuery.html

and particularly the naming of the parameters I don't think the API actually works as you
described:

	ToChildBlockJoinQuery(Query parentQuery, Filter parentsFilter, boolean doScores)

If the filter was intended to filter the child docs I think it would be called childFilter
no?

I think the use of the CachingWrappingFilter in the example I got from Mike McCandless' blog
post was the real cause of the exception I was seeing (maybe things have changed internally
since that post).  I finally noticed a mention of the FixedBitSetCachingWrapperFilter in the
description of the ToChildBlockJoinQuery constructor in the API docs.  When I changed to using
a filter produced by the FixedBitSetCachingWrapperFilter class the IllegalStateException no
longer occurs and I get the child docs using ToChildBlockJoinQuery with a parent doc filter
and parent doc query and results look correctly limited by the parent constraints.  For example:

...
Gub-Gub's Book: An Encyclopedia of Food (Fictional work), Fictional work, 119320101
	by: Lofting, Hugh - NP, American, Writer

The Story of Doctor Dolittle, Being the History of His Peculiar Life at Home and Astonishing
Adventures in Foreign Parts (Novel), Novel, 119200101
	by: Lofting, Hugh - NP, American, Writer

The Voyages of Doctor Dolittle (Novel), Novel, 119220101
	by: Lofting, Hugh - NP, American, Writer

The Story of Doctor Dolittle (Novel), Novel, 119200101
	by: Lofting, Hugh - NP, American, Writer

...
Mister Beers (Poem), Poem, null
	by: Lofting, Hugh - NP, American, Writer

The Twilight of Magic (Novel), Novel, 119300101
	by: Lofting, Hugh - NP, American, Writer

Picnic (Lofting, Hugh) (Poem), Poem, null
	by: Lofting, Hugh - NP, American, Writer

The Impossible Patriotism Project (Picture story), Picture story, 120070101

A Skeleton in God's Closet: A Novel (Novel), Novel, 119940101
	by: Maier, Paul Luther - NP, American, null

Pontius Pilate (Novel), Novel, 119680101
	by: Maier, Paul Luther - NP, American, null

...
Josephus: The Essential Writings (Collection), Collection, 119880101
	by: Maier, Paul Luther - NP, American, null

She Said the Geese (Poem), Poem, null
	by: Lifshin, Lyn - NP, American, Poet

She Said She Could See Music (Poem), Poem, null
	by: Lifshin, Lyn - NP, American, Poet
...

However I see no way to further limit the children as you describe.  If I use "a query that
matches the set of parents and a filter that matches the set of children" as you suggest I
get no results back.  I think your description of how it should work makes complete sense,
but that is not what I'm seeing when I try it.  Here's the code that produced the above output:

	private void runToChildBlockJoinQuery(String indexPath) throws IOException {
		FSDirectory dir = FSDirectory.open(new File(indexPath));
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(reader);
		
		TermQuery parentFilterQuery = new TermQuery(new Term("AGTY", "np"));
		BooleanQuery parentQuery = new BooleanQuery();
		parentQuery.add(new TermQuery(new Term("AGTY", "np")), Occur.MUST);
		parentQuery.add(new TermQuery(new Term("NT", "american")), Occur.MUST);
		
		Filter parentFilter = new FixedBitSetCachingWrapperFilter(new QueryWrapperFilter(parentFilterQuery));

		ToChildBlockJoinQuery tcbjq = new ToChildBlockJoinQuery(parentQuery, parentFilter, true);
		
		TopDocs worksDocs = searcher.search(tcbjq, 5000);
		
		System.out.println("\n*ToChildBlockJoinQuery hit count = " + worksDocs.scoreDocs.length);
		displayWorks(reader, searcher, worksDocs);
	}

	private void displayWorks(IndexReader reader, IndexSearcher searcher, TopDocs worksDocs)
throws IOException {
		for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
			String agdn = reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
			String tw = reader.document(worksDocs.scoreDocs[i].doc).get("TW");
			String pd = reader.document(worksDocs.scoreDocs[i].doc).get("PD");
			String crid = reader.document(worksDocs.scoreDocs[i].doc).get("CRID");
			TopDocs creatorDocs = searcher.search(new TermQuery(new Term("ABID", crid)), Integer.MAX_VALUE);
			System.out.println("\n" + agdn + ", " + tw + ", " + pd);
			displayCreators(reader, searcher, creatorDocs);
		}
	}

	private void displayCreators(IndexReader reader, IndexSearcher searcher, TopDocs worksDocs)
throws IOException {
		for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
			String agdn = reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
			String agty = reader.document(worksDocs.scoreDocs[i].doc).get("AGTY");
			String nt = reader.document(worksDocs.scoreDocs[i].doc).get("NT");
			String poc = reader.document(worksDocs.scoreDocs[i].doc).get("POC");
			System.out.println("\tby: " + agdn + " - " + agty + ", " +nt + ", " + poc);
		}
	}

When I try to use ToParentBlockJoinQuery I don't get any results either and it is not what
I really want anyway, I want the child documents limited by the parent documents.

ToChildBlockJoinQuery almost gives me what I want, but I really need to be able to filter
the child docs returned as well as the parent from which they came.  If you (or anybody) still
thinks I'm doing it wrong please let me know.  If I should file a bug report also let me know
that, I have a small index I can provide if it is useful.  Thanks again for your help.

Jim

________________________________________
From: Gregory Dearing [gregdearing@gmail.com]
Sent: Wednesday, January 21, 2015 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: ToChildBlockJoinQuery question

Jim,

I think you hit the nail on the head... that's not what BlockJoinQueries do.

If you're wanting to search for children and join to their parents... then
use ToParentBlockJoinQuery, with a query that matches the set of children
and a filter that matches the set of parents.

If you're searching for parents, then joining to their children... then use
ToChildBlockJoinQuery, with a query that matches the set of parents and a
filter that matches the set of children.

When you add related documents to the index (via addDocuments), make that
children are added before their parents.

The reason all the above is necessary is that it makes it possible to have
a nested hierarchy of relationships (ie. Parents have Children, which have
Children of their own).  You need a query to indicate which part of the
hierarchy you're starting from, and a filter indicating which part of the
hierarchy you're joining to.

Also, you will always get an exception if your query and your filter both
match the same document.  A child can't be its own parent.

BlockJoin is a very powerful feature, but what it's really doing is
modelling relationships using an index that doesn't know what a
relationship is.  The relationships are determined by a combination of the
order that you indexed the block, and the format of your query.  This
disjoin can lead to some weird behavior if you're not absolutely sure how
it works.

Thanks,
Greg





On Wed, Jan 21, 2015 at 4:34 PM, McKinley, James T <
james.mckinley@cengage.com> wrote:

>
> Am I understanding how this is supposed to work?  What I think I am (and
> should be) doing is providing a query and filter that specifies the parent
> docs and the ToChildBlockJoinQuery should return me all the child docs for
> the resulting parent docs.  Is this correct?  The reason I think I'm not
> understanding is that I don't see why I need both a filter and a query to
> specify the parent docs when a single query or filter should suffice.  Am I
> misunderstanding what parentQuery and parentFilter mean, they both refer to
> parent docs right?
>
> Jim
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message