lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: 30 milllion+ docs on a single server
Date Sat, 12 Aug 2006 23:45:15 GMT
The single server is important because I think it will take a lot of 
work to scale it to multiple servers. The index must allow for close to 
real-time updates and additions. It must also remain searchable at all 
times (other than than during the brief period of single updates and 
additions). If it is easy to scale this to multiple servers please tell 
me how.

- Mark
> Why is a single server so important?  I can scale horizontally much 
> cheaper
> than I scale vertically.
> On 8/11/06, Mark Miller <> wrote:
>> I've made a nice little archive application with lucene. I made it to
>> handle our largest need: 2.5 million docs or so on a single server. Now
>> the powers that be say: lets use it for a 30+ million document archive
>> on a single server! (each doc size maybe 10k small as a 1 or
>> 2k) Please tell me why we are in trouble...please tell me why we are
>> not. I have tested up to 2 million docs without much trouble but 30
>> million...the average search will include a sort on a field as
>> well...can I search 30+ million docs with a sort? Man am I worried about
>> that. Maybe the server will have 8 procs and 12 billion gigs of RAM.
>> Mabye. Even still, Tomcat seems to be able to launch with a max of 1.5
>> or 1.6 gig of Ram in Windows. What do you think? 30 million+ sounds like
>> too much of a load to me for a single server. Not that they care what I
>> think...I only wrote the thing (man I hate my job, offer me a new one :)
>> )...please...comments?
>> Cheers,
>> Miserable Mark
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message