Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Message-ID: <20040507064249.56677.qmail@web12704.mail.yahoo.com>
Date: Thu, 6 May 2004 23:42:49 -0700 (PDT)
From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
Subject: Re: Query performance on a 315 Million document index (1TB)
To: Lucene Users List <lucene-user@jakarta.apache.org>
In-Reply-To: <20040506234755.21F8C1CE305@ws3-6.us4.outblaze.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

That's big, and while I have not created such large indices with
Lucene, I would think that disk I/O would be the biggest issue.  That
is why Nutch has distributed search options built in, and their demo
has 'only' 100M documents.  Perhaps you can mimic distributed indexing
and searching approach of Nutch.

Otis

--- Will Allen <wga22@email.com> wrote:
> Hi,
> 	I am considering a project that would index 315+ million documents.
> I am comfortable that the indexing will work well in creating an
> index ~800GB in size, but am concerned about the query performance.
> (Is this a = bad
> assumption?)
> 
> What are the bottlenecks of performance as an index scales?  Memory? 
> = Cost is not a concern, so what would be the shortcomings of a
> theoretical = machine with 16GB of ram, 4-16 cpus and 1-2 terabytes
> of space?  Would it be = better to cluster machines to break apart
> the query?
> 
> Thank you for your serious responses,
> Will Allen
> -- 
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org