lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: solr cloud does not start with many collections
Date Wed, 04 Mar 2015 09:09:49 GMT
On 3/4/2015 1:02 AM, Shawn Heisey wrote:
> Even now, nearly three hours after startup, the Solr log is still
> spitting out thousands of lines that look like this, so I don't think I
> can call it stable:
> 
> INFO  - 2015-03-04 07:35:51.166;
> org.apache.solr.common.cloud.ZkStateReader; Updating data for mycoll1515
> to ver 60
> 
> I'm going to try bringing up the other Solr instance now, and if that
> stabilizes with all shards in the green, I will try to continue adding
> collections.

I've come to one major conclusion about this whole thing, even before I
reach the magic number of 4000 collections.  Thousands of collections is
not at all practical with SolrCloud currently.  Some additional
conclusions about this setup:

* Stopping and restarting the entire cluster will quite literally take
hours for full stability.  A rolling restart *might* go faster, but
honestly I would not count on that.

* An external zookeeper ensemble is absolutely critical.  Zookeeper
stability is extremely important.

* A lot of heap memory is required, even if the indexes are completely
empty and there is no query/index activity.  Active indexes with data
are going to push that even higher, and will very likely slow down
recovery on server restart.

* Operating system limits for the max number of open files and max
number of processes allowed will need to be reconfigured - these are
settings that are NOT managed by Solr or Jetty.  Configuration may vary
widely between different operating systems.

* Thousands of collections *might* work OK if there are enough servers
so that each one doesn't have more than a couple hundred cores.  This
would need to be tested, and I don't have the available hardware.

I'm not sure that the OP's problem can actually be called a bug ... it's
more of a performance limitation.  We should still file an issue and
treat it like a bug, though.

Thanks,
Shawn


Mime
View raw message