Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 83727 invoked from network); 5 Apr 2011 10:20:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Apr 2011 10:20:30 -0000 Received: (qmail 40472 invoked by uid 500); 5 Apr 2011 10:20:28 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40439 invoked by uid 500); 5 Apr 2011 10:20:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40431 invoked by uid 99); 5 Apr 2011 10:20:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2011 10:20:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2011 10:20:22 +0000 Received: by wwi18 with SMTP id 18so175585wwi.5 for ; Tue, 05 Apr 2011 03:20:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.10.213 with SMTP id q21mr1168304wbq.144.1301998799913; Tue, 05 Apr 2011 03:19:59 -0700 (PDT) Received: by 10.227.144.131 with HTTP; Tue, 5 Apr 2011 03:19:59 -0700 (PDT) In-Reply-To: <4D9AB59D.2050001@thorntothehorn.org> References: <4D9AB59D.2050001@thorntothehorn.org> Date: Tue, 5 Apr 2011 06:19:59 -0400 Message-ID: Subject: Re: DocIdSet to represent small numberr of hits in large Document set From: Michael McCandless To: java-user@lucene.apache.org Cc: Antony Bowesman Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Can we simply factor out (poach!) those useful-sounding classes from Nutch into Lucene? Mike http://blog.mikemccandless.com On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman wr= ote: > I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4). > > Many of our indexes are 5M+ Documents, however, only a small subset of th= ese > are relevant to any user. =A0As a DocIdSet, backed by a BitSet or OpenBit= Set, > is rather inefficient in terms of memory use, what is the recommended way= to > DocIdSet implementation to use in this scenario? > > Seems like SortedVIntList can be used to store the info, but it has no > methods to build the list in the first place, requiring an array or bitse= t > in the constructor. > > I had used Nutch's DocSet and HashDocSet implementations in my 2.3.2 > deployment, but want to move away from that Nutch dependency, so wondered= if > Lucene had a way to do this? > > Thanks > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org