lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: ToChildBlockJoinQuery question
Date Thu, 22 Jan 2015 20:31:01 GMT
Great! Thanks for letting us know

-Mike

On 1/22/15 2:07 PM, McKinley, James T wrote:
> Hi Mike,
>
> I guess given the difficulty I've had getting the block join query to work it didn't
occur to me to try and combine it in a BooleanQuery. :P   Using the BJQ in a BooleanQuery
with other TermQuerys works fine and does exactly what I wanted!  Thanks very much for your
help!
>
> Jim
> ________________________________________
> From: Michael Sokolov [msokolov@safaribooksonline.com]
> Sent: Thursday, January 22, 2015 11:45 AM
> To: java-user@lucene.apache.org
> Subject: Re: ToChildBlockJoinQuery question
>
> I think the idea is that you create a blockjoinquery that encapsulates
> the join relation, and then you can create additional constraints in the
> result document space. In the case of ToChildBJQ, the result documents
> are child documents, so any additional query constraints will be applied
> to child documents.  For example, you could create the
>
> ToChildBlockJoinQuery bjq = jamesBJQ();
> TermQuery tq = new TermQuery (new Term("title", "doctor"));
> BooleanQuery bq = new BooleanQuery (bjq, tq);
>
> bq would then match books with parent (ie author) restrictions defined
> in jamesBJQ(), and child (ie book) restrictions defined by other queries
> like tq (title:doctor)
>
> -Mike
>
> On 1/22/15 11:27 AM, McKinley, James T wrote:
>> Hi Greg,
>>
>> Thanks describing how block join queries were intended to work.  Your description
makes sense to me, however according to the API docs:
>>
>> http://lucene.apache.org/core/4_8_0/join/org/apache/lucene/search/join/ToChildBlockJoinQuery.html
>>
>> and particularly the naming of the parameters I don't think the API actually works
as you described:
>>
>>        ToChildBlockJoinQuery(Query parentQuery, Filter parentsFilter, boolean doScores)
>>
>> If the filter was intended to filter the child docs I think it would be called childFilter
no?
>>
>> I think the use of the CachingWrappingFilter in the example I got from Mike McCandless'
blog post was the real cause of the exception I was seeing (maybe things have changed internally
since that post).  I finally noticed a mention of the FixedBitSetCachingWrapperFilter in the
description of the ToChildBlockJoinQuery constructor in the API docs.  When I changed to using
a filter produced by the FixedBitSetCachingWrapperFilter class the IllegalStateException no
longer occurs and I get the child docs using ToChildBlockJoinQuery with a parent doc filter
and parent doc query and results look correctly limited by the parent constraints.  For example:
>>
>> ...
>> Gub-Gub's Book: An Encyclopedia of Food (Fictional work), Fictional work, 119320101
>>        by: Lofting, Hugh - NP, American, Writer
>>
>> The Story of Doctor Dolittle, Being the History of His Peculiar Life at Home and
Astonishing Adventures in Foreign Parts (Novel), Novel, 119200101
>>        by: Lofting, Hugh - NP, American, Writer
>>
>> The Voyages of Doctor Dolittle (Novel), Novel, 119220101
>>        by: Lofting, Hugh - NP, American, Writer
>>
>> The Story of Doctor Dolittle (Novel), Novel, 119200101
>>        by: Lofting, Hugh - NP, American, Writer
>>
>> ...
>> Mister Beers (Poem), Poem, null
>>        by: Lofting, Hugh - NP, American, Writer
>>
>> The Twilight of Magic (Novel), Novel, 119300101
>>        by: Lofting, Hugh - NP, American, Writer
>>
>> Picnic (Lofting, Hugh) (Poem), Poem, null
>>        by: Lofting, Hugh - NP, American, Writer
>>
>> The Impossible Patriotism Project (Picture story), Picture story, 120070101
>>
>> A Skeleton in God's Closet: A Novel (Novel), Novel, 119940101
>>        by: Maier, Paul Luther - NP, American, null
>>
>> Pontius Pilate (Novel), Novel, 119680101
>>        by: Maier, Paul Luther - NP, American, null
>>
>> ...
>> Josephus: The Essential Writings (Collection), Collection, 119880101
>>        by: Maier, Paul Luther - NP, American, null
>>
>> She Said the Geese (Poem), Poem, null
>>        by: Lifshin, Lyn - NP, American, Poet
>>
>> She Said She Could See Music (Poem), Poem, null
>>        by: Lifshin, Lyn - NP, American, Poet
>> ...
>>
>> However I see no way to further limit the children as you describe.  If I use "a
query that matches the set of parents and a filter that matches the set of children" as you
suggest I get no results back.  I think your description of how it should work makes complete
sense, but that is not what I'm seeing when I try it.  Here's the code that produced the above
output:
>>
>>        private void runToChildBlockJoinQuery(String indexPath) throws IOException
{
>>                FSDirectory dir = FSDirectory.open(new File(indexPath));
>>                IndexReader reader = DirectoryReader.open(dir);
>>                IndexSearcher searcher = new IndexSearcher(reader);
>>
>>                TermQuery parentFilterQuery = new TermQuery(new Term("AGTY", "np"));
>>                BooleanQuery parentQuery = new BooleanQuery();
>>                parentQuery.add(new TermQuery(new Term("AGTY", "np")), Occur.MUST);
>>                parentQuery.add(new TermQuery(new Term("NT", "american")), Occur.MUST);
>>
>>                Filter parentFilter = new FixedBitSetCachingWrapperFilter(new QueryWrapperFilter(parentFilterQuery));
>>
>>                ToChildBlockJoinQuery tcbjq = new ToChildBlockJoinQuery(parentQuery,
parentFilter, true);
>>
>>                TopDocs worksDocs = searcher.search(tcbjq, 5000);
>>
>>                System.out.println("\n*ToChildBlockJoinQuery hit count = " + worksDocs.scoreDocs.length);
>>                displayWorks(reader, searcher, worksDocs);
>>        }
>>
>>        private void displayWorks(IndexReader reader, IndexSearcher searcher, TopDocs
worksDocs) throws IOException {
>>                for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
>>                        String agdn = reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
>>                        String tw = reader.document(worksDocs.scoreDocs[i].doc).get("TW");
>>                        String pd = reader.document(worksDocs.scoreDocs[i].doc).get("PD");
>>                        String crid = reader.document(worksDocs.scoreDocs[i].doc).get("CRID");
>>                        TopDocs creatorDocs = searcher.search(new TermQuery(new Term("ABID",
crid)), Integer.MAX_VALUE);
>>                        System.out.println("\n" + agdn + ", " + tw + ", " + pd);
>>                        displayCreators(reader, searcher, creatorDocs);
>>                }
>>        }
>>
>>        private void displayCreators(IndexReader reader, IndexSearcher searcher, TopDocs
worksDocs) throws IOException {
>>                for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
>>                        String agdn = reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
>>                        String agty = reader.document(worksDocs.scoreDocs[i].doc).get("AGTY");
>>                        String nt = reader.document(worksDocs.scoreDocs[i].doc).get("NT");
>>                        String poc = reader.document(worksDocs.scoreDocs[i].doc).get("POC");
>>                        System.out.println("\tby: " + agdn + " - " + agty + ", " +nt
+ ", " + poc);
>>                }
>>        }
>>
>> When I try to use ToParentBlockJoinQuery I don't get any results either and it is
not what I really want anyway, I want the child documents limited by the parent documents.
>>
>> ToChildBlockJoinQuery almost gives me what I want, but I really need to be able to
filter the child docs returned as well as the parent from which they came.  If you (or anybody)
still thinks I'm doing it wrong please let me know.  If I should file a bug report also let
me know that, I have a small index I can provide if it is useful.  Thanks again for your help.
>>
>> Jim
>>
>> ________________________________________
>> From: Gregory Dearing [gregdearing@gmail.com]
>> Sent: Wednesday, January 21, 2015 6:59 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: ToChildBlockJoinQuery question
>>
>> Jim,
>>
>> I think you hit the nail on the head... that's not what BlockJoinQueries do.
>>
>> If you're wanting to search for children and join to their parents... then
>> use ToParentBlockJoinQuery, with a query that matches the set of children
>> and a filter that matches the set of parents.
>>
>> If you're searching for parents, then joining to their children... then use
>> ToChildBlockJoinQuery, with a query that matches the set of parents and a
>> filter that matches the set of children.
>>
>> When you add related documents to the index (via addDocuments), make that
>> children are added before their parents.
>>
>> The reason all the above is necessary is that it makes it possible to have
>> a nested hierarchy of relationships (ie. Parents have Children, which have
>> Children of their own).  You need a query to indicate which part of the
>> hierarchy you're starting from, and a filter indicating which part of the
>> hierarchy you're joining to.
>>
>> Also, you will always get an exception if your query and your filter both
>> match the same document.  A child can't be its own parent.
>>
>> BlockJoin is a very powerful feature, but what it's really doing is
>> modelling relationships using an index that doesn't know what a
>> relationship is.  The relationships are determined by a combination of the
>> order that you indexed the block, and the format of your query.  This
>> disjoin can lead to some weird behavior if you're not absolutely sure how
>> it works.
>>
>> Thanks,
>> Greg
>>
>>
>>
>>
>>
>> On Wed, Jan 21, 2015 at 4:34 PM, McKinley, James T <
>> james.mckinley@cengage.com> wrote:
>>
>>> Am I understanding how this is supposed to work?  What I think I am (and
>>> should be) doing is providing a query and filter that specifies the parent
>>> docs and the ToChildBlockJoinQuery should return me all the child docs for
>>> the resulting parent docs.  Is this correct?  The reason I think I'm not
>>> understanding is that I don't see why I need both a filter and a query to
>>> specify the parent docs when a single query or filter should suffice.  Am I
>>> misunderstanding what parentQuery and parentFilter mean, they both refer to
>>> parent docs right?
>>>
>>> Jim
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message