lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Bowman <>
Subject Re: Scaling
Date Fri, 18 Jul 2008 07:49:12 GMT
Jason Rutherglen wrote:
> The scaling per machine should be linear.  The overhead from the network is
> minimal because the Lucene object sizes are not impacting.  Google mentions
> in one of their early white papers on scaling
> that they have sub
> indexes which are now popularly called shards over which an individual
> thread performs a search over.  Executed in parallel (ParallelMultiSearcher
> which does not use thread pooling) the response time will be faster than
> using a single thread assuming part of the indexes are in the system cache.
> A query is simply an iteration so it is easy to see how parallelization
> speeds up response times.  Queries per second should ideally be solved by
> adding more hardware with the same indexes on each server.  Then further
> dividing these into what can be termed cells which represent different
> indexes on sets of servers.
> Having a large index on a single machine does not scale well because most of
> the index will not be in the system cache.  If the index grows so does the
> response time.  Dividing the index up into shards and cells allows for
> efficient scaling which is proven at the big G.  It puts more of the total
> index in the system cache of many machines.
> The general assumption is that hardware is cheap and can be added easily,
> search systems can take advantage of this and parallelize as much as
> possible, per server, per application.
One thing I have trouble understanding is how scoring works in this 
case.  Does Lucene really "just work", or are there special things we 
have to do to make sure that the scores are coherent so we can actually 
decide which was the best match?  What kind of constraints are there 
when breaking up the index into parts to make sure scoring remains coherent?


Eric Bowman
Boboco Ltd

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message