Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 11444 invoked from network); 24 Sep 2009 16:29:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Sep 2009 16:29:43 -0000 Received: (qmail 95388 invoked by uid 500); 24 Sep 2009 16:29:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 95329 invoked by uid 500); 24 Sep 2009 16:29:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 95319 invoked by uid 99); 24 Sep 2009 16:29:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Sep 2009 16:29:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.223.186 as permitted sender) Received: from [209.85.223.186] (HELO mail-iw0-f186.google.com) (209.85.223.186) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Sep 2009 16:29:32 +0000 Received: by iwn16 with SMTP id 16so1144554iwn.29 for ; Thu, 24 Sep 2009 09:29:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=VHUc3vWFgQo2ZjeQfH7s4nf2zcgUmeoOCZUucuCg6Zw=; b=OswkXmm3BneyVX8RXNQYAAnntFYThxMPEQGdu9M7n0INeKPj7U2ur40DBoeKOAcCvi J92p+94zi4sXm4qpa2VIWQgMPIMwmJ0Ut99Ikd/KJTKgP/rYxe9RsdP2EwgldSvNXOCx STcnhRQK6an/9YQHRL/+SK8QHVCDFOeUk6Jdo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=uSVQHEKLT6TlNTnSPd1Vu8yKM76UHTyr3jyleI8KmRUzkjd4cOgRklQZFGqgLt+goY QKit73Nbd9rKd0giFgtePpQs38DO7AAHuObzZkwbINsmw4hUDO/btX0QWw88DE3VtPgt 6trZVAnq50O87iL6F3cAgxoPU18R9bLnqWO8E= MIME-Version: 1.0 Received: by 10.231.125.28 with SMTP id w28mr7627232ibr.50.1253809752219; Thu, 24 Sep 2009 09:29:12 -0700 (PDT) In-Reply-To: <1253805447.9486.30.camel@bohr> References: <1253805447.9486.30.camel@bohr> Date: Thu, 24 Sep 2009 12:29:12 -0400 Message-ID: <359a92830909240929v6ba2d83cn6f888c89d1c12c7a@mail.gmail.com> Subject: Re: metrics for index ~100M docs From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e645b8cae5a21804745554cf X-Virus-Checked: Checked by ClamAV on apache.org --0016e645b8cae5a21804745554cf Content-Type: text/plain; charset=ISO-8859-1 It's really hard to say anything meaningful here. How many fields? Whatkind of sorting to you intend to do? How complex are the queries you expect? And even if you have meaningful answers to the above, then "it depends" (tm). Then you could go to SOLR (which is built on Lucene) to handle distributed searching and a host of other infrastructure issues. There are certainly Lucene installations out there that are much larger than you're considering if that helps. But you can create a small test app *very* quickly that'll help you answer this for your local set of conditions, which might be a good place to start. Don't forget the "powered by" section of the Wiki for some ideas: http://wiki.apache.org/lucene-java/PoweredBy Best Erick On Thu, Sep 24, 2009 at 11:17 AM, Joel Halbert wrote: > Hi, > > Does anyone know of any recent metrics & stats on building out an index > of ~100mm documents (each doc approx 5k). I'm looking for approx stats > on time to build, time to query and infrastructure requirements (number > of machines & spec) to reasonably support an index of such a size. > > Thanks, > Joel > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e645b8cae5a21804745554cf--