Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 48640 invoked from network); 16 Feb 2011 13:53:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Feb 2011 13:53:26 -0000 Received: (qmail 37162 invoked by uid 500); 16 Feb 2011 13:53:25 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 36755 invoked by uid 500); 16 Feb 2011 13:53:23 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 36746 invoked by uid 99); 16 Feb 2011 13:53:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Feb 2011 13:53:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Feb 2011 13:53:20 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6E33B1A804F for ; Wed, 16 Feb 2011 13:53:00 +0000 (UTC) Date: Wed, 16 Feb 2011 13:53:00 +0000 (UTC) From: "Robert Muir (JIRA)" To: dev@lucene.apache.org Message-ID: <1525738845.20569.1297864380446.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <916915244.3851.1296599369007.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2903: -------------------------------- Attachment: for_pfor.patch Nice results Hao! One idea for the low-frequency multitermqueries (foo* etc) could be in the attached patch: i only implemented this for the existing FrameOfRef and PatchedFrameOfRef but perhaps you could steal/test the idea with your implementation. In these cases i switched them over to a single byte header instead of an int. This means less overhead per-block, a slightly smaller (maybe 1-2%?) index. It might be more useful if we switch your codec over from Sep layout to interleaved (Fixed) layout, to make a more efficient skipBlock()... but this interleaved layout is still a work in progress. > Improvement of PForDelta Codec > ------------------------------ > > Key: LUCENE-2903 > URL: https://issues.apache.org/jira/browse/LUCENE-2903 > Project: Lucene - Java > Issue Type: Improvement > Reporter: hao yan > Attachments: LUCENE-2903.patch, LUCENE-2903.patch, for_pfor.patch > > > There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. > The FrameOfRef is a very basic one which is essentially a binary encoding (may result in huge index size). > The PatchedFrameOfRef is the implmentation based on the original version of PForDelta in the literatures. > The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The Codec name is changed to NewPForDelta.). > In particular, the changes are: > 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta does not support very large exceptions (since > the Simple16 does not support very large numbers). Now this has been fixed in the new LCPForDelta. > 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec. > 3. The performance test results are: > 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef for almost all kinds of queries, slightly worse then BulkVInt. > 2) My "NewPForDelta" codec can result in the smallest index size among all 4 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) > 3) All performance test results are achieved by running with "-server" instead of "-client" -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org