lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4768) Child Traversable To Parent Block Join Query
Date Mon, 11 Feb 2013 16:11:15 GMT


Mark Harwood commented on LUCENE-4768:

OK - this problem seems to be about an ill-defined user query ("Saturn sky blue Sedan" with
no explicit fields) being executed against a well-defined schema (cars with manufacturers,
model names and bodyStyles that also have trims with colours).

If that's the case you have a heap of problems here which aren't necessarily related to the
"block join" implementation. One example - IDF ranking being what it is, if a manufacturer
like Ford create a model called the "Blue" or you have bad data entry that has an example
of this value stored in the wrong field then Lucene will naturally rank model:blue higher
than color:blue because of the scarcity of the token "blue" in that field context. That's
almost the inverse of what you want.

A couple of suggestions for "field-less" queries like your example of "Saturn sky blue sedan"
1) Target the query on an unstructured "onebox" field that holds indexed content from all
fields to achieve a more balanced IDF score.
2) Tokenize each item in the query string and find a "most likely" field for each search term
by examining doc frequencies e.g. color:blue vs modelName:blue etc. Augment the "onebox" query
in 1) with the most-likely-field interpretation for each word in the query string if it has
sufficient doc frequency.

> Child Traversable To Parent Block Join Query
> --------------------------------------------
>                 Key: LUCENE-4768
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>         Environment: trunk
> git rev-parse HEAD
> 5cc88eaa41eb66236a0d4203cc81f1eed97c9a41
>            Reporter: Vadim Kirilchuk
>         Attachments: LUCENE-4768-draft.patch
> Hi everyone!
> Let me describe what i am trying to do:
> I have hierarchical documents ('car model' as parent, 'trim' as child) and use block
join queries to retrieve them. However, i am not happy with current behavior of ToParentBlockJoinQuery
which goes through all parent childs during nextDoc call (accumulating scores and freqs).
> Consider the following example, you have a query with a custom post condition on top
of such bjq: and during post condition you traverse scorers tree (doc-at-time) and want to
manually push child scorers of bjq one by one until condition passes or current parent have
no more childs.
> I am attaching the patch with query(and some tests) similar to ToParentBlockJoin but
with an ability to traverse childs. (i have to do weird instance of check and cast inside
my code) This is a draft only and i will be glad to hear if someone need it or to hear how
we can improve it. 
> P.s i believe that proposed query is more generic (low level) than ToParentBJQ and ToParentBJQ
can be extended from it and call nextChild() internally during nextDoc().
> Also, i think that the problem of traversing hierarchical documents is more complex as
lucene have only nextDoc API. What do you think about making api more hierarchy aware? One
level document is a special case of multi level document but not vice versa. WDYT?
> Thanks in advance.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message