lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents
Date Tue, 17 May 2011 16:27:47 GMT


Robert Muir commented on LUCENE-3112:

I suppose we could consider changing the index format today to record
which docs are subs... but I think we don't need to. Maybe I should
strengthen the @experimental to explain the risk that a future
reindexing could be required?

I think this would be perfect. I certainly don't want to hold up this 
improvement, yet, in the future I just didnt want us to be in a 
situation where we say 'well if only we had recorded this information,
now its not possible to do XYZ because someone COULD have used 
add/updateDocuments() for some arbitrary reason and we will 'split' 
their grouped ids'.

We could also include in the note that various existing 
IndexSorters/Splitters are unaware about this, so use with caution :)

> Add IW.add/updateDocuments to support nested documents
> ------------------------------------------------------
>                 Key: LUCENE-3112
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>         Attachments: LUCENE-3112.patch
> I think nested documents (LUCENE-2454) is a very compelling addition
> to Lucene.  It's also a popular (many votes) issue.
> Beyond supporting nested document querying, which is already an
> incredible addition since it preserves the relational model on
> indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
> should also enable speedups in grouping implementation when you group
> by a nested field.
> For the same reason, it can also enable very fast post-group facet
> counting impl (LUCENE-3097) when you what to
> count(distinct(nestedField)), instead of unique documents, as your
> "identifier".  I expect many apps that use faceting need this ability
> (to count(distinct(nestedField)) not distinct(docID)).
> To support these use cases, I believe the only core change needed is
> the ability to atomically add or update multiple documents, which you
> cannot do today since in between add/updateDocument calls a flush (eg
> due to commit or getReader()) could occur.
> This new API (addDocuments(Iterable<Document>), updateDocuments(Term
> delTerm, Iterable<Document>) would also further guarantee that the
> documents are assigned sequential docIDs in the order the iterator
> provided them, and that the docIDs all reside in one segment.
> Segment merging never splits segments apart, so this invariant would
> hold even as merges/optimizes take place.

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message