Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 26729 invoked from network); 23 Oct 2009 05:39:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Oct 2009 05:39:40 -0000 Received: (qmail 25408 invoked by uid 500); 23 Oct 2009 05:39:38 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 25323 invoked by uid 500); 23 Oct 2009 05:39:38 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 25313 invoked by uid 99); 23 Oct 2009 05:39:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Oct 2009 05:39:38 +0000 X-ASF-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jake.mannix@gmail.com designates 209.85.211.183 as permitted sender) Received: from [209.85.211.183] (HELO mail-yw0-f183.google.com) (209.85.211.183) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Oct 2009 05:39:35 +0000 Received: by ywh13 with SMTP id 13so11855954ywh.29 for ; Thu, 22 Oct 2009 22:39:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=3hp66Rye8r90j2T2SMdt5GH+/kW0C9VD+j0IWwM6Sn4=; b=pEh3jk3Pw8sZ+Th0r0VY1uoKM02KWqFdCDo8BMxBSd5LApNpkl5BuFGJF5u7LGWmGL UX11XgAzzQxoglSF7sPBUVEdKqYogRvUNYgSDCoEWUsOYTIWaxSVpyIURbb5eEnXV4FN +Tz2HqkiE3FumOOmGf7nK3wgSsnenXXntw+u0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Sl6wPmYPNaC7kDxlY/TWaEmYE5ZHnnvkyak8CEUPie0RJn/3GsaSTe+2r21Io91BM7 IforXwkG/gDKIfNjS9aMOcofRp4/8YL8/SnRJEgYgm0onbFhe0GGbdMKyG8+nhtZHl4m 4mIQoCpzn+HhXuyRgJXaC33okIvkc1GmIdbqM= MIME-Version: 1.0 Received: by 10.90.143.16 with SMTP id q16mr13353457agd.26.1256276354992; Thu, 22 Oct 2009 22:39:14 -0700 (PDT) In-Reply-To: References: Date: Thu, 22 Oct 2009 22:39:14 -0700 Message-ID: <4b124c310910222239k7d1de1c1q8cb7c54165ad2f89@mail.gmail.com> Subject: Re: Maximum index file size From: Jake Mannix To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00163630f949e1096c047693a142 --00163630f949e1096c047693a142 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Oct 22, 2009 at 10:29 PM, Hrishikesh Agashe < hrishikesh_agashe@persistent.co.in> wrote: > Can I create an index file with very large size, like 1 TB or so? Is there > any limit on how large index file one can create? Also, will I be able to > search on this 1 TB index file at all? > Leaving aside the question of hardware or JVM limits on monstrous files, this question (can you search this file) is easier: if you've got say, a ten billion documents in one index, and you have a query which is going to hit maybe even just 0.1% of the documents, you'll need to do scoring of 10 million hits in the course of that query. To do this in under a second means you only have 100 nanoseconds to look at each document. If your query hits 1% of your documents, you're down to 10 ns per document. I've never tried searching a 1TB index, but I'd say that's pushing it. Is there a reason you can't shard your index, and instead put maybe 20 shards of 50GB (or better - 100 shards of 10GB) each on a variety of machines, and just merge results? -jake --00163630f949e1096c047693a142--