lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2962) Skip data should be inlined into the postings lists
Date Thu, 17 Mar 2011 10:36:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007883#comment-13007883
] 

Michael McCandless commented on LUCENE-2962:
--------------------------------------------

I think this paper is relevant: http://vigna.dsi.unimi.it/ftp/papers/CompressedPerfectEmbeddedSkipLists.pdf

> Skip data should be inlined into the postings lists
> ---------------------------------------------------
>
>                 Key: LUCENE-2962
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2962
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>
> Today, we store all skip data as a separate blob at the end of a given term's postings
(if that term occurs in enough docs to warrant skip data).
> But this adds overhead during decoding -- we have to seek to a different place for the
initial load, we have to init separate readers, we have to seek again while using the lower
levels of the skip data, etc.  Also, we have to fully decode all skip information even if
we are not going to use it (eg if I only want docIDs, I still must decode position offset
and lastPayloadLength).
> If instead we interleaved skip data into the postings file, we could keep it local, and
"private" to each file that needs skipping.  This should make it least costly to init and
then use the skip data, which'd be a good perf gain for eg PhraseQuery, AndQuery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message