lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Halbert <>
Subject Re: metrics for index ~100M docs
Date Thu, 24 Sep 2009 16:33:57 GMT
I found this thread pretty useful:

-----Original Message-----
From: Erick Erickson <>
Subject: Re: metrics for index ~100M docs
Date: Thu, 24 Sep 2009 12:29:12 -0400

It's really hard to say anything meaningful here. How many fields? Whatkind
of sorting to you intend to do? How complex are the queries you

And even if you have meaningful answers to the above,
then "it depends" (tm).

Then you could go to SOLR (which is built on Lucene) to handle
distributed searching and a host of other infrastructure issues.

There are certainly Lucene installations out there that are much larger
than you're considering if that helps.

But you can create a small test app *very* quickly that'll help you
answer this for your local set of conditions, which might be a good
place to start.

Don't forget the "powered by" section of the Wiki for some ideas:


On Thu, Sep 24, 2009 at 11:17 AM, Joel Halbert <>wrote:

> Hi,
> Does anyone know of any recent metrics & stats on building out an index
> of  ~100mm documents (each doc approx 5k). I'm looking for approx stats
> on time to build, time to query and infrastructure requirements (number
> of machines & spec) to reasonably support an index of such a size.
> Thanks,
> Joel
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message