lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Scalability Questions
Date Fri, 20 Oct 2006 07:46:23 GMT
Hi Guerre,

The reason you haven't received any answers yet is because this is pretty impossible the answer.....and
so I'll try to answer your questions now, at 3:40 AM. ;)

----- Original Message ----
From: Guerre Bear <>
Sent: Wednesday, October 18, 2006 3:14:00 PM
Subject: Scalability Questions

Hello All,

Lucene looks very interesting to me.  I was wondering if any of you could
comment on a few questions:

1) Assuming I use a typical server such as a dual-core dual-processor Dell
2950, about how many files can Lucene index and still have a sub-two-second
search speed for a simple search string such as "invoice 2005 mitsubishi"?
For the sake of argument, I figure that a typical file will have about 30KB
of text in it.

Impossible to answer.  It would depend whether your index is in RAM (cached) or not, how many
queries are hitting it symultaneously, how complex the queries are, how deep in search results
you are going, etc.

2) How many of these servers would it take to manage an index of one billion
such files?

Hm, 9 zeros.... my semi-wild guess is that this would require 100+ search servers.

3) Are there any HOWTO's on constructing a large Lucene search cluster?

Nope.  People have done it, I've done some of it, but there is no good and publicly available
information.  This would make a good case study for Lucene in Action 2, but my guess is that
no company would want to share their secrets.

4) Roughly how large is the index file in comparison to the size of the
input files?

It depends on whether you store fields or just index them, plus there is also a compression
(gzip -9 equivalent) option.

5) How does Lucene's search performance/scalability compare to some of the
expensive commercial search products such as Fast?  (

I can't go into details here, but I have information from a few _very_ reliable sources that
you should think 2-3-4-5.... times before going with FAST.  Sorry I can't be more specific.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message