lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <>
Subject [jira] Commented: (LUCENE-2454) Nested Document query support
Date Mon, 28 Feb 2011 17:22:36 GMT


Paul Elschot commented on LUCENE-2454:

bq. I think your proposal here is related to a new (to me) use case where clients can add
a single new "child" document and the index automagically reorganises to assemble all prior
related documents back into a structure where they are grouped as contiguous documents held
in the same segment?


The first two fields match the ones I intended.
The third field for the document type would be quite useful for searching, but it may not
be necessary to maintain the document order.

The intention is quite simple: allow a set of documents to be used to provide a single score
value during query searching. AFAICT that fits most of the use cases described here.

To allow conjunctions inside such a set, it is necessary to advance() a scorer into a set,
and for that it might be better to put the set representative before the children. The document
order would then be pre-order instead of post-order, which would not really make any difference
in difficulty to keep the docs in order.
With the representative before the children, an extra operation (sth like previousDocId())
would be needed on the iterator of the filter.

I don't know about flushes during merging. One operation that would recur during index maintenance
is appending a sequence of documents from one segment to another segment, see docs 1, 2 and
3 above.
This is indeed what needs to be done when a new child is added, or when an existing one is
changed, i.e. deleted and added.
I'm not familiar with the merging code, but I would suppose something very close to appending
a sequence of documents from an existing segment is already available. Anyway this is costly,
but that is the price to pay.

During searching, the term filters used for the node representatives might use some optimizations.
Since one term filter is needed for every document scorer involved in searching the query
and these term filters are all based on the same term, they could share index information,
for example in a filter cache.
A bit set is not always optimal for such filters, perhaps a more tree like structure could
be more compact and faster. But bit sets could be used to get this going.

The good news so far for me is that this seems to be feasible, thanks.

> Nested Document query support
> -----------------------------
>                 Key: LUCENE-2454
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments:
> A facility for querying nested documents in a Lucene index as outlined in

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message