lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (Commented) (JIRA)" <>
Subject [jira] [Commented] (SOLR-3076) Solr should support block joins
Date Fri, 10 Feb 2012 02:09:59 GMT


Hoss Man commented on SOLR-3076:

bq. Maybe there can be field aliases? Eg, book_page_count:[0 to 1000] and chapter_page_count[10:40],
and the QP is told to map book_page_count -> parent:size and chapter_page_count -> child:size?
Or maybe we let the user explicitly scope the field, eg chapter:size, book:size, book:title,
etc. Not sure...

Hmmm... i kind of understand what you're saying; but the part i'm not understanding is even
if you had field aliasing like that, given some query string like... 
  book_page_count:[0 TO 1000] and chapter_page_count[10 TO 40]
{code} would the parser know whether the user was asking for the results to be "book documents"
matching that criteria (1-1000 pages and containing at least one chapter child containing
10-40 pages), or "chapter documents" matching that criteria (10-40 pages contained in a book
of 1-1000 pages) or "page documents" (all pages in containing in a chapter of 10-40 total
pages, contained in a book of 1-1000 total pages) ?

I mean: it seems possible, and a QParser like that could totally support configuring those
types of file mappings / hierarchy definitions in init params, but perhaps we should focus
on the more user explicit, direct mapping type QParser type approach Mikhail has already started
on for now, and consider that as an enhancement later?  (especially since it's not clear how
the indexing side will be managed/enforced -- depending on how that shapes up, it might be
possible for a QParser like you're describing, or perhaps _all_ QParsers to infer the field
rules from the schema or some other configuration)

I think the syntax in Mikhail's BlockJoinParentQParserPlugin looks great as a straight forward
baseline implementation.  The one straw man suggestion i might toss out there for consideration
would be to invert the use of the "filter" and "v" local params, so instead of...

{!parent filter="parent:true"}child_name:b
{!parent filter="parent:true"}
{code} might be...

{!parent of="child_names:b"}parent:true

...people may find that easier to read as a way to understand that the final query will return
"parent documents" constraint such that those parent documents have children matching the
"of" query.  The one thing i don't like this "of" idea is that (compared to the "filter" param
Mikhail uses) it might be more tempting for people to use something like...

// WRONG! (i think)
q={!parent of="child_names:b"}some_parent_field:foo

...when they mean to write something like this...

q={!parent of="child_names:b"}some_query_that_identifies_the_set_of_all_parents

...because as i understand it, it's important for the "parentFilter" to identify *all* of
the parent documents, even ones you may not want returned, so that the ToParentBlockJoinQuery
knows how to identify the parent of each document (correct?)

This type of user confusion is still possible with the syntax Mikhail's got, but i suspect
it will be less likely --- In any case, i wanted to put the idea out there.

Given McCandless supposition that the parent/child relationships are likely to be very consistent,
not very deep, and not vary from query to query, one thing we could do to to help mitigate
this possible confusion would be:
 * make the "filter" param name much longer and verbose, ie: {{setOfAllParentsQuery}}
 * make the param optional, and have it default to something specified as an init param, ie:
 * make the init param mandatory

That way, in the common case people will configure things like...

<queryParser name="parent" class="solr.BlockJoinParentQParserPlugin">
  <str name="defaultSetOfAllParentsQuery">type:parent</str>

..and their queries will be simple...

q={!parent}              (all parent docs)
q={!parent}foo:bar       (all parent docss that contain kid docs matching foo:bar)

...but it will still be possible for people with more complex usecases with do more complex

Mikhail: some other minor feedback on the parts i understood of your patch that i understood
(note: my lack of understanding is not a fault of your patch, it's just that most of the block
join stuff is very foreign to me)...

* please prune down "solrconfig-bjqparser.xml" so it contains only the absolute minimum things
you need for the test case, it makes it a lot easier for people to review the patch, and for
users to understand what is necessary to utilize features demoed in the test (we have a lot
of old bloaded solrconfig files i nthe test dir, but we're trying to stop doing that)
* the test would be a bit easier to follow if you used different letters for the parent fields
vs the child fields (abcdef, vs xyz for example)
* it would be good to have tests verifying that nested parent queries work as expected, ie:
that something like this works...
q={!parent filter="type:book" v=$chapters}
chapters=+chapter_title:Solr +_query_:{!parent filter="type:chapter" v=$pages}
* it would be good to have your tests introspect the cache after doing the query to make sure
the number of inserts, lookups, and hits match what you expect.

...but like i said: all in all i think it's really good.
> Solr should support block joins
> -------------------------------
>                 Key: SOLR-3076
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>         Attachments: SOLR-3076.patch, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch,
parent-bjq-qparser.patch, parent-bjq-qparser.patch, solrconf-bjq-erschema-snippet.xml
> Lucene has the ability to do block joins, we should add it to Solr.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message