Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 96170 invoked from network); 15 Sep 2008 20:24:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Sep 2008 20:24:49 -0000 Received: (qmail 25183 invoked by uid 500); 15 Sep 2008 20:24:39 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 25159 invoked by uid 500); 15 Sep 2008 20:24:39 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 25150 invoked by uid 99); 15 Sep 2008 20:24:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Sep 2008 13:24:39 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [217.146.182.39] (HELO web27107.mail.ukl.yahoo.com) (217.146.182.39) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 15 Sep 2008 20:23:40 +0000 Received: (qmail 91890 invoked by uid 60001); 15 Sep 2008 20:24:08 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID; b=Hb7xC1AAXMinTdfXkBpo6J5s+M2Sg+s+SgrWcXkdPnp7AqiJ2xcvDpeyx4vpZcmBO+VemdqHm3a/1BWQG8uTzSpof/rtdKew5v6cf8h37VLm1lcxAkH4HBTRBEGdXGCdrJnsZm8vro/CE+cQFKmPQD3iav5/u1njlIEWTgSMFI4=; X-YMail-OSG: vcf.ssIVM1mqyuSh66SyaephT5qvHKXq7c6WAGupRqUkfhMkafYsKna9AacTzreyHo70.SJE4_F461LcbmxlzlMyj6PqaP0CfLz1v74S8VAHXpxW1Vu5re68S83uiUbxdHw- Received: from [79.201.76.16] by web27107.mail.ukl.yahoo.com via HTTP; Mon, 15 Sep 2008 20:24:08 GMT X-Mailer: YahooMailRC/1096.28 YahooMailWebService/0.7.218.2 Date: Mon, 15 Sep 2008 20:24:08 +0000 (GMT) From: eks dev Subject: Re: docid set compression and boolean docid set operations To: java-dev@lucene.apache.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-916673834-1221510248=:91640" Message-ID: <415711.91640.qm@web27107.mail.ukl.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org --0-916673834-1221510248=:91640 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > Are you guys interested in helping out on kamikaze?=0A=0Asure, as much= as my schedule permits (slow but steady :).=0A =0Ap4delta should be the r= ight way to go. My proposal would be to try to make as fast as it goes (Pau= l's comment about if in decompression loop...) for the simplest case, set o= f integers (longs?) and make it available as one of Filter implementations = (fast track commit into lucene and bigger exposure to others that can make = it better). =0AThe code in kamikaze is probably almost there (not looked in= to it yet), we just need to make an issue in JIRA and provide simple patch= for basic p4delta functionality and one Filter implementation with a few t= est cases. So it becomes usefull for the comunity from the very start (also= possibility to mix it with Pauls code on DocIdSetIterators...). =0A =0AThi= s will not bring huge benefit but is definitly usfull option for Filters. T= he real work then is to make use of it for on disk format and replace parti= ally VInt encoding for more involved cases I meantoned before like (docId d= elta, term frequency) pairs in lucene with multilevel skipping information.= =0A=0AAs Paul said, small steps :) =0A =0Acheers, =0Aeks =0A=0A=0A = --0-916673834-1221510248=:91640 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
>    Are you guys interested in helping o= ut on kamikaze?

sure, as much as my sch= edule permits (slow but steady :).
 
p4delta should be the righ= t way to go. My proposal would be to try to make as fast as it goes (Paul's= comment about if in decompression loop...) for the simplest case, set of i= ntegers (longs?) and make it available as one of Filter implementations (fa= st track commit into lucene and bigger exposure to others that can make it = better).
The code in kamikaze is probably almost there (not looked into= it yet), we just need to make an issue in JIRA  and provide simple pa= tch for basic p4delta functionality and one Filter implementation with a few t= est cases. So it becomes usefull for the comunity from the very start (also= possibility to mix it with Pauls code on DocIdSetIterators...).
 =
This will not bring huge benefit but is definitly usfull option for Fil= ters. The real work then is to make use of it for on disk format and replac= e partially VInt encoding for more involved cases I meantoned before like (= docId delta, term frequency) pairs in lucene with multilevel skipping infor= mation. 

As Paul said, small steps :)  
=0A  =0Acheers,
eks 
=0A

=0A=0A=0A=0A=0A = --0-916673834-1221510248=:91640--