lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
Date Tue, 03 Sep 2013 19:05:53 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756931#comment-13756931
] 

Shai Erera commented on LUCENE-5189:
------------------------------------

OK, so now I get your point. The problem is that we pass to Codec FI.attributes with say an
attribute 'foo=bar'. The Codec, unaware that this is an update, looks at the given numericFields
and decides to encode them using method "bar2", so it encodes into the attributes 'foo=bar2',
but those attributes get lost because they're not rewritten to FIS. Do I understand correctly?

Of course, we could say that since the Codec has to peek into SWS.isFieldUpdate, thereby making
it updates-aware, it should not encode stuff in a different format, but SWS.isFieldUpdate
is not enough to enforce that.

I don't think that gen'ing FIS solves the problem of obtaining the right DVF in the first
place. Sure, after we do that, the Codec can put whatever attributes that it wants, they will
be recorded in the new FIS.gen.

But maybe we can solve these two problems by gen'ing FIS:

* Add FieldInfo.dvGen. The Codec will receive the FieldInfos with their dvGen bumped up.
* Codec can choose to look at FI.dvGen and pull the right DVF e.g. like PerField does.
** Or it can choose to completely ignore it, and always write udpates using the new format.
* Codec is free to record whatever attributes it wants on this FI. Since we gen FIS, they
will be recorded and used by the reader.

What do you think?
                
> Numeric DocValues Updates
> -------------------------
>
>                 Key: LUCENE-5189
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5189
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch,
LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the amount of
changes are immense and hard to follow/consume. The reason is that we targeted postings, stored
fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are a couple
of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the values of all
the documents in a segment for the updated field (similar to how livedocs work, and previously
norms).
> * It's a fairly contained issue, attempting to handle just one data type to update, yet
requires many changes to core code which will also be useful for updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the data types
in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message