lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otis Gospodnetic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7407) Explore switching doc values to an iterator API
Date Tue, 20 Sep 2016 21:47:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507892#comment-15507892
] 

Otis Gospodnetic commented on LUCENE-7407:
------------------------------------------

bq. there is a lot of fun improvements we can make here, in follow-on issues, so that e.g.
LUCENE-7253 (merging of sparse doc values fields) is fixed.

So LUCENE-7253 is where the new Codec work for trunk will go?
Did you maybe create the other issues you mentioned?  Asking because I'm curious what you
have in mind and so I can link+watch.

> Explore switching doc values to an iterator API
> -----------------------------------------------
>
>                 Key: LUCENE-7407
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7407
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>              Labels: docValues
>         Attachments: LUCENE-7407.patch
>
>
> I think it could be compelling if we restricted doc values to use an
> iterator API at read time, instead of the more general random access
> API we have today:
>   * It would make doc values disk usage more of a "you pay for what
>     what you actually use", like postings, which is a compelling
>     reduction for sparse usage.
>   * I think codecs could compress better and maybe speed up decoding
>     of doc values, even in the non-sparse case, since the read-time
>     API is more restrictive "forward only" instead of random access.
>   * We could remove {{getDocsWithField}} entirely, since that's
>     implicit in the iteration, and the awkward "return 0 if the
>     document didn't have this field" would go away.
>   * We can remove the annoying thread locals we must make today in
>     {{CodecReader}}, and close the trappy "I accidentally shared a
>     single XXXDocValues instance across threads", since an iterator is
>     inherently "use once".
>   * We could maybe leverage the numerous optimizations we've done for
>     postings over time, since the two problems ("iterate over doc ids
>     and store something interesting for each") are very similar.
> This idea has come up many in the past, e.g. LUCENE-7253 is a recent
> example, and very early iterations of doc values started with exactly
> this ;)
> However, it's a truly enormous change, likely 7.0 only.  Or maybe we
> could have the new iterator APIs also ported to 6.x side by side with
> the deprecate existing random-access APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message