lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: 30 milllion+ docs on a single server
Date Sun, 13 Aug 2006 05:15:07 GMT
This is unlikely to work well/fast.  It will depend on the size of the index (not in terms
of the number of docs, but its physical size), the number of queries/second and desired query
latency.  If you can wait 10 seconds to get a query and if only a few queries are hitting
the server at any one time, then you may be Ok.  Having things be up to date with non-relevancy
sorting will be quite tough.  FieldCache will consume some RAM.  Warming it up will take some
number of seconds.  Re-opening an IndexSearcher after index changes will also cost you a bit
of time.

Consider a 64-bit server with more RAM that allowed larger Java heaps, and try to fit your
index into RAM.

Otis

----- Original Message ----
From: Mark Miller <markrmiller@gmail.com>
To: java-user@lucene.apache.org
Sent: Saturday, August 12, 2006 7:45:15 PM
Subject: Re: 30 milllion+ docs on a single server

The single server is important because I think it will take a lot of 
work to scale it to multiple servers. The index must allow for close to 
real-time updates and additions. It must also remain searchable at all 
times (other than than during the brief period of single updates and 
additions). If it is easy to scale this to multiple servers please tell 
me how.

- Mark
> Why is a single server so important?  I can scale horizontally much 
> cheaper
> than I scale vertically.
>
>
>
> On 8/11/06, Mark Miller <markrmiller@gmail.com> wrote:
>>
>> I've made a nice little archive application with lucene. I made it to
>> handle our largest need: 2.5 million docs or so on a single server. Now
>> the powers that be say: lets use it for a 30+ million document archive
>> on a single server! (each doc size maybe 10k max...as small as a 1 or
>> 2k) Please tell me why we are in trouble...please tell me why we are
>> not. I have tested up to 2 million docs without much trouble but 30
>> million...the average search will include a sort on a field as
>> well...can I search 30+ million docs with a sort? Man am I worried about
>> that. Maybe the server will have 8 procs and 12 billion gigs of RAM.
>> Mabye. Even still, Tomcat seems to be able to launch with a max of 1.5
>> or 1.6 gig of Ram in Windows. What do you think? 30 million+ sounds like
>> too much of a load to me for a single server. Not that they care what I
>> think...I only wrote the thing (man I hate my job, offer me a new one :)
>> )...please...comments?
>>
>> Cheers,
>>
>> Miserable Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message