Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 3346 invoked from network); 17 Mar 2011 10:36:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Mar 2011 10:36:55 -0000 Received: (qmail 30711 invoked by uid 500); 17 Mar 2011 10:36:54 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 30631 invoked by uid 500); 17 Mar 2011 10:36:54 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 30624 invoked by uid 99); 17 Mar 2011 10:36:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2011 10:36:53 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2011 10:36:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 9689B3AD7B0 for ; Thu, 17 Mar 2011 10:36:29 +0000 (UTC) Date: Thu, 17 Mar 2011 10:36:29 +0000 (UTC) From: "Michael McCandless (JIRA)" To: dev@lucene.apache.org Message-ID: <806575579.8612.1300358189613.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1625075136.11089.1299776339323.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Commented: (LUCENE-2962) Skip data should be inlined into the postings lists MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007883#comment-13007883 ] Michael McCandless commented on LUCENE-2962: -------------------------------------------- I think this paper is relevant: http://vigna.dsi.unimi.it/ftp/papers/CompressedPerfectEmbeddedSkipLists.pdf > Skip data should be inlined into the postings lists > --------------------------------------------------- > > Key: LUCENE-2962 > URL: https://issues.apache.org/jira/browse/LUCENE-2962 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Labels: gsoc2011, lucene-gsoc-11, mentor > > Today, we store all skip data as a separate blob at the end of a given term's postings (if that term occurs in enough docs to warrant skip data). > But this adds overhead during decoding -- we have to seek to a different place for the initial load, we have to init separate readers, we have to seek again while using the lower levels of the skip data, etc. Also, we have to fully decode all skip information even if we are not going to use it (eg if I only want docIDs, I still must decode position offset and lastPayloadLength). > If instead we interleaved skip data into the postings file, we could keep it local, and "private" to each file that needs skipping. This should make it least costly to init and then use the skip data, which'd be a good perf gain for eg PhraseQuery, AndQuery. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org