Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 623 invoked from network); 2 Nov 2010 19:03:15 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Nov 2010 19:03:15 -0000 Received: (qmail 57911 invoked by uid 500); 2 Nov 2010 19:03:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 57869 invoked by uid 500); 2 Nov 2010 19:03:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 57861 invoked by uid 99); 2 Nov 2010 19:03:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Nov 2010 19:03:44 +0000 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLYTO,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of simon.willnauer@googlemail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Nov 2010 19:03:38 +0000 Received: by vws5 with SMTP id 5so29938vws.35 for ; Tue, 02 Nov 2010 12:03:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:content-type; bh=eqbAusWfEMecaU3PRLGlA73dEcD4DUZNS6o8VmjLVAM=; b=WenMCuNEGGR1P4Y7FI8lncllUPl5k3Z/u8nx7/+2ThNpfHnb6y5fW44AtpXX8nlRBo +kECKfIIvMNBwGrfpJBh0+eqUM/7EI8y2RyDeRlBfZ0eaSTOxogn7iEDYwyHT4aZMxG2 fdPmAsF0D2iNMGGQ7TkIK4+P+sLAGsPt5CIyg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; b=fuTJicX6v+/dDskq5LSjcVuPyWt+0J0BeisJMmYi9FUoCroK1EeTF9ycC17yomXj5a TyIfxn4klHRbPIICk8tFM/CGYc5d8sw0zkZ6epzqSDcg1yM2i24gbDqt48r2FX0wtruW PSX2pWtCae9NFzoVbCYxHIao3YFbC/dCmwrbo= MIME-Version: 1.0 Received: by 10.224.197.5 with SMTP id ei5mr10438213qab.283.1288724597552; Tue, 02 Nov 2010 12:03:17 -0700 (PDT) Received: by 10.229.70.135 with HTTP; Tue, 2 Nov 2010 12:03:17 -0700 (PDT) Reply-To: simon.willnauer@gmail.com In-Reply-To: References: Date: Tue, 2 Nov 2010 20:03:17 +0100 Message-ID: Subject: Re: How to handle more than Integer.MAX_VALUE documents? From: Simon Willnauer To: java-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog wrote: > 2billion is a hard limit. Usually people split indexes into multiple > index long before this, and use the parallel multi reader (I think) to > read from all of the sub-indexes. > > On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng > wrote: >> >> Hi, >> >> Now lucene uses integer as document id, so it means we cannot have more >> than 2^31-1 documents within one collection? Even if we use MultiSearcher >> the document id is still integer so it seems this is still a problem? This is really the limit of a segment, I think you can write you own collector and collect documents which higher (absolute) doc ids than INT_MAX. Yet, I think if you reach the limit of INT_MAX documents you should really rethink the way your search works and apply some sharding techniques. I really haven't been up to that many docs in a single index but I think it should work to have multiple segments with INT_MAX documents in it since we search on segment level provided if you collector supports it. simon >> >> We have been using lucene for some time and our document count is growing >> rather rapidly, maybe this is a much-discussed issue already, but I did not >> find the lead, any pointer would be really appreciated. >> >> Thanks very much for helps, Lisheng >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > > > -- > Lance Norskog > goksron@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org