lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4768) Child Traversable To Parent Block Join Query
Date Mon, 11 Feb 2013 11:35:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575740#comment-13575740
] 

Mark Harwood commented on LUCENE-4768:
--------------------------------------

As with any discussion about nested queries you need to be very clear about the required logic.
When you talk about matching f1:A or f1:B - are we talking about matches on the same child
doc or possibly matches on different child docs of the same parent? The examples don't make
this clear.
If we assume your child-based criteria is focused on examining the contents of single children
(as opposed to combining f1:A on one child doc with f1:B on a different child doc) then a
BooleanQuery that combines these child query elements will already be sufficient for skipping
through children.

Not really sure what you are trying to optimize anyway with skipping - parent-child combos
are limited to what fits into a single segment which is in turn limited by RAM. You don't
generally get parents with "many many" children because of these constraints. The "nextDoc"
calls you are trying to skip are related to a compressed block of child doc IDs (gap encoded
varints) that are read off disk in 1K chunks (if I recall default Directory settings correctly).
The chances are high that the limited number of child docIDs that belong to each parent are
already in RAM as part of normal disk access patterns so there is no real saving in disk IO.
Are you sure this is a performance bottleneck?



                
> Child Traversable To Parent Block Join Query
> --------------------------------------------
>
>                 Key: LUCENE-4768
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4768
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>         Environment: trunk
> git rev-parse HEAD
> 5cc88eaa41eb66236a0d4203cc81f1eed97c9a41
>            Reporter: Vadim Kirilchuk
>         Attachments: LUCENE-4768-draft.patch
>
>
> Hi everyone!
> Let me describe what i am trying to do:
> I have hierarchical documents ('car model' as parent, 'trim' as child) and use block
join queries to retrieve them. However, i am not happy with current behavior of ToParentBlockJoinQuery
which goes through all parent childs during nextDoc call (accumulating scores and freqs).
> Consider the following example, you have a query with a custom post condition on top
of such bjq: and during post condition you traverse scorers tree (doc-at-time) and want to
manually push child scorers of bjq one by one until condition passes or current parent have
no more childs.
> I am attaching the patch with query(and some tests) similar to ToParentBlockJoin but
with an ability to traverse childs. (i have to do weird instance of check and cast inside
my code) This is a draft only and i will be glad to hear if someone need it or to hear how
we can improve it. 
> P.s i believe that proposed query is more generic (low level) than ToParentBJQ and ToParentBJQ
can be extended from it and call nextChild() internally during nextDoc().
> Also, i think that the problem of traversing hierarchical documents is more complex as
lucene have only nextDoc API. What do you think about making api more hierarchy aware? One
level document is a special case of multi level document but not vice versa. WDYT?
> Thanks in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message