Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 68033 invoked from network); 15 May 2006 16:18:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 15 May 2006 16:18:36 -0000 Received: (qmail 51366 invoked by uid 500); 15 May 2006 16:18:33 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 51335 invoked by uid 500); 15 May 2006 16:18:33 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 51324 invoked by uid 99); 15 May 2006 16:18:33 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 May 2006 09:18:33 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 May 2006 09:18:32 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 492F07141F9 for ; Mon, 15 May 2006 16:18:06 +0000 (GMT) Message-ID: <7452729.1147709886297.JavaMail.jira@brutus> Date: Mon, 15 May 2006 16:18:06 +0000 (GMT+00:00) From: "paul.elschot (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-328) Some utilities for a compact sparse filter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/LUCENE-328?page=all ] paul.elschot updated LUCENE-328: -------------------------------- Attachment: SkipFilter1.patch This patches Filter.java and IndexSearcher.java . Filter.java is modified to implement SkipFilter, to provide a first step in a backward compatible way to slowly make Filter independent of BitSet. IndexSearcher.java is modified to test for a DocNrSkipper from a given Filter, and to use that. In that case also skipTo() is used on the scorer of the query being filtered. This patch requires org.apache.lucene.util.DocNrSkipper, which available at this issue. Also required is org.apache.lucene.search.SkipFilter, which is available at LUCENE-330. The patch also contains some commented test code for Filter.java. This test code always provides a DocNrSkipper (from the BitSet). With and without this test code, all tests pass here. When extending Filter in this way, SkipFilter may not be necessary at all. I left it in to allow a path forward to complete independence from BitSet. In case SkipFilter stays, it would be good to add (a) new method(s) to IndexSearcher allowing a SkipFilter to filter a query. The DocNrSkipper interface contains only one method: nextDocNr(int docNr). It may be good for performance to also add a nextDocNr() method without any argument, much like skipTo(target) and next() on Scorer. IOW, I do not consider DocNrSkipper stable at this moment. I don't think this patch should be added to release 2.0. > Some utilities for a compact sparse filter > ------------------------------------------ > > Key: LUCENE-328 > URL: http://issues.apache.org/jira/browse/LUCENE-328 > Project: Lucene - Java > Type: Improvement > Components: Search > Versions: CVS Nightly - Specify date in submission > Environment: Operating System: other > Platform: Other > Reporter: paul.elschot > Assignee: Lucene Developers > Priority: Minor > Attachments: AndDocNrSkipper.java, AndDocNrSkipper.java, BitSetSortedIntList.java, DocNrSkipper.java, DocNrSkipper.java, IntArraySortedIntList.java, IntArraySortedIntList.java, OrDocNrSkipper.java, OrDocNrSkipper.java, SkipFilter1.patch, SortedVIntList.java, SortedVIntList.java, SortedVIntList.java, TestDocNrSkippers.java, TestDocNrSkippers.java, TestSortedVIntList.java, TestSortedVIntList.java, TestSortedVIntList.java > > Two files are attached that might form the basis for an alternative > filter implementation that is more memory efficient than one bit > per doc when less than about 1/8 of the docs pass through the filter. > > The document numbers are stored in RAM as VInt's from the Lucene index > format. These VInt's encode the difference between two successive > document numbers, much like a PositionDelta in the Positions: > http://jakarta.apache.org/lucene/docs/fileformats.html > > The getByteSize() method can be used to verify the compression > once a SortedVIntList is constructed. > The precise conditions under which this is more memory efficient than > one bit per document are not easy to specify in advance. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org