lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
Date Thu, 21 Nov 2013 06:16:38 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828524#comment-13828524
] 

Shai Erera commented on LUCENE-5189:
------------------------------------

You're right Simon. The updates are buffered in their raw form in memory until a flush is
needed (e.g. commit(), or NRT-open). At that point they are resolved and written to the Directory.
This is where it differs from deletes - while deletes are small enough to keep the resolved
form in-memory, updates aren't - a single update can affect millions of documents, each takes
a long (updated value) ... perhaps future work could be to distinguish between small and large
updates, and keep the small updates still in memory. But I believe that will affect a lot
more code, e.g. SegReader will now need to be aware of in-memory NDV and on-disk and do a
kind of merge between them when an NDV is requested for such field ... it's not going to be
pretty-looking code I imagine.

> Numeric DocValues Updates
> -------------------------
>
>                 Key: LUCENE-5189
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5189
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 4.6, 5.0
>
>         Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch,
LUCENE-5189-renames.patch, LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch,
LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch,
LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch,
LUCENE-5189.patch, LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the amount of
changes are immense and hard to follow/consume. The reason is that we targeted postings, stored
fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are a couple
of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the values of all
the documents in a segment for the updated field (similar to how livedocs work, and previously
norms).
> * It's a fairly contained issue, attempting to handle just one data type to update, yet
requires many changes to core code which will also be useful for updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the data types
in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message