lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: 30 milllion+ docs on a single server
Date Fri, 11 Aug 2006 23:22:57 GMT
Tomi NA wrote:
> On 8/12/06, Mark Miller <> wrote:
>> I've made a nice little archive application with lucene. I made it to
>> handle our largest need: 2.5 million docs or so on a single server. Now
>> the powers that be say: lets use it for a 30+ million document archive
>> on a single server! (each doc size maybe 10k small as a 1 or
>> 2k) Please tell me why we are in trouble...please tell me why we are
>> not. I have tested up to 2 million docs without much trouble but 30
>> million...the average search will include a sort on a field as
>> well...can I search 30+ million docs with a sort? Man am I worried about
>> that. Maybe the server will have 8 procs and 12 billion gigs of RAM.
>> Mabye. Even still, Tomcat seems to be able to launch with a max of 1.5
>> or 1.6 gig of Ram in Windows. What do you think? 30 million+ sounds like
>> too much of a load to me for a single server. Not that they care what I
>> think...I only wrote the thing (man I hate my job, offer me a new one :)
>> )...please...comments?
>> Cheers,
>> Miserable Mark
> I don't really understand what you're so worried about. Either it'll
> work well with the setup you have, or it won't. It's really the size
> of it. ;)
> Seriously, you have a number of relatively cheap possibilities at hand
> to improve search performance: storing the index on a RAID 5 disk
> array will let you read the indices very fast, using multicore CPUs,
> adding memory and even if all that isn't good enough, you can always
> use a small cluster (say, 4 nodes) of very, very inexpensive PCs
> filled with a GB of RAM. You don't have to keep them inside the
> regular UPS/backup/voult-protected area as the indices can always be
> rebuilt (unlike e.g. data in transactional systems) and between 4 of
> them they might cost like an entry-level server.
> I'll let the experts speak now. :)
> t.n.a.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
Thanks for the tip...I am not too worried...I am miserable because I 
live in Dilbert land, not this particular incident. Spreading to 
multiple servers is a possibility but one I want to avoid...I wrote this 
app on the side since our current product is still needs a lot 
of work and thinking about distributing lucene at this point is a little 
much...I never even have time to work on this project as it is becuase I 
am currently tasked with porting the crap old project to Windows. I need 
to do a bunch to shore up what I have. No one cares though...they think 
that I have done nothing (or can't understand what I have done) while at 
the same time they want to use what I havn't done to do what I made it 
for as well as this new super archive of 30 million + the end 
I'll be looking for a new job...still curious about lucene scaling to 30 
million docs with a sort on every search though (yes I know the sort is 
cached...worries me too though...the sort will be on multiple and 
different fields depending no what the searcher wants...uggg...the size 
of the caches....)

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message