lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Solr Cloud, 100 shards, shards progressively become slower
Date Thu, 08 Jan 2015 15:32:23 GMT
On 1/8/2015 7:26 AM, Andrew Butkus wrote:
> Hi, we have 8 solr servers, split 4x4 across 2 data centers.
> We have a collection of around ½ billion documents, split over 100 shards, each is replicated
4 times on separate nodes (evenly distributed across both data centers).
> The problem we have is that when we use cursormark (and also when we don't use cursormark
the pattern below is the same but just shorter in time) the time it takes to query each shard
gets progressively longer when distrib=true , I have tried to query shards directly (with
shards=) and select my own shards to query to see if it was a bandwidth bottleneck and the
performance is normal / fine - when using pre-defined shards.
> Does anyone know why the shards become progressively slower when distrib=true? Or any
suggestions on how I can fix, or how to debug the problem further?
> I have monitored the performance of CPU and it never goes above 10% on each server, so
its not cpu, also the memory usage is about 4gb out of 16gb so its not a memory issue either.
> I have tried all shard shuffling strategies incase it was a bottleneck at a server being
over used but as above, the cpu never goes above 10%, and when I use shards= there are never
any querytime bottlenecks.

The part about memory usage is not clear.  That 4GB and 16GB could refer
to the operating system view of memory, or the view of memory within the
JVM.  I'm curious about how much total RAM each machine has, how large
the Java heap is, and what the total size of the indexes that live on
each machine is.

Even if they are individually very small, 500 million documents will
result in a very large index, so I'm guessing that you don't have enough
RAM on each server for your index size.

What can happen with a highly sharded index that is too large for
available RAM:  Index data for the initial queries gets read from the OS
disk cache, but as those queries run, the information required for the
shards that come later in the distributed query gets pushed out of the
disk cache, so Solr must actually read the disk to do those later
queries.  Disks are slow, so if the machine has to actually read from
the disk, Solr will be slow.


View raw message