lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents
Date Tue, 17 May 2011 13:14:48 GMT


Michael McCandless commented on LUCENE-3112:

bq. Yet, I think you should push the document iteration etc into DWPT to actually apply the
delterm only once to make it really atomic.

Ahh good point -- it's wrong just passing that delTerm down N times, too.  I'll fix.

bq. I also wonder if we should allow multiple delTerm e.g. Tuple<DelTerm, Document>
otherwise you would be bound to one delterm pre "collection" but what if you want to remove
only one of the "sub-documents"?

So, this won't work today w/ nested querying, if I understand it right.  Ie, if you only update
one of the subs, now your subdocs are no longer sequential (nor in one segment).  So I think
"design for today" here...?

Someday, when we implement incremental field updates correctly, so that updates are written
as stacked segments against the original segment containing the document, at that point I
think we can add an API that lets you update multiple docs atomically?

> Add IW.add/updateDocuments to support nested documents
> ------------------------------------------------------
>                 Key: LUCENE-3112
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>         Attachments: LUCENE-3112.patch
> I think nested documents (LUCENE-2454) is a very compelling addition
> to Lucene.  It's also a popular (many votes) issue.
> Beyond supporting nested document querying, which is already an
> incredible addition since it preserves the relational model on
> indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
> should also enable speedups in grouping implementation when you group
> by a nested field.
> For the same reason, it can also enable very fast post-group facet
> counting impl (LUCENE-3097) when you what to
> count(distinct(nestedField)), instead of unique documents, as your
> "identifier".  I expect many apps that use faceting need this ability
> (to count(distinct(nestedField)) not distinct(docID)).
> To support these use cases, I believe the only core change needed is
> the ability to atomically add or update multiple documents, which you
> cannot do today since in between add/updateDocument calls a flush (eg
> due to commit or getReader()) could occur.
> This new API (addDocuments(Iterable<Document>), updateDocuments(Term
> delTerm, Iterable<Document>) would also further guarantee that the
> documents are assigned sequential docIDs in the order the iterator
> provided them, and that the docIDs all reside in one segment.
> Segment merging never splits segments apart, so this invariant would
> hold even as merges/optimizes take place.

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message