Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of solr@elyograg.org designates
 166.70.79.219 as permitted sender)
Message-ID: <528B853B.2090309@elyograg.org>
Date: Tue, 19 Nov 2013 08:35:23 -0700
From: Shawn Heisey <solr@elyograg.org>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: solr-user@lucene.apache.org
Subject: Re: Question regarding possibility of data loss
References: <1384867121358-4101915.post@n3.nabble.com>
In-Reply-To: <1384867121358-4101915.post@n3.nabble.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On 11/19/2013 6:18 AM, adfel70 wrote:
> Hi, we plan to establish an ensemble of solr with zookeeper. 
> We gonna have 6 solr servers with 2 instances on each server, also we'll
> have 6 shards with replication factor 2, in addition we'll have 3
> zookeepers. 

You'll want to do one Solr instance per machine.  Each Solr instance can
house many cores (shard replicas).  More than one instance per machine
will: 1) Add memory/CPU overhead.  2) Accidentally and easily result in
a situation where multiple replicas for a single shard are located on
the same machine.

> Our concern is that we will send documents to index and solr won't index
> them but won't send any error message and we will suffer a data loss
> 
> 1. Is there any situation that can cause this kind of problem? 
> 2. Can it happen if some of ZKs are down? or some of the solr instances? 
> 3. How can we monitor them? Can we do something to prevent these kind of
> errors? 

1) If it does become possible for data loss to occur without notifying
your application, it will be considered a very serious bug, and top
priority will be given to fixing it.  A release with the fix will be
made as quickly as possible.  Of course I cannot guarantee that such
bugs don't exist, but I am not aware of any at the moment.

2) You must have a majority ([n/2] + 1) of zookeepers operational.  If
you have three or four zookeepers, one zookeeper can be down and
SolrCloud will continue to function perfectly.  With five or six
zookeepers, two can be down.  With seven or eight, three can be down.
As far as Solr itself, if one replica of each shard from a collection is
working, then the entire collection will work.  That means you'll want
to have at least replicationFactor=2, so there are two copies of each shard.

3) There are MANY options for monitoring.  Many of them are completely
free, and it is always possible to write your own.  One high-level thing
you can do is make sure the hosts are up and that they are running the
proper number of java processes.  Solr offers a number of API entry
points that will tell you how things are working, and more are added
over time.  I don't think there are any zookeeper-specific informational
capabilities at the moment, but I did file a bug report asking for the
feature.  When I have some time, I will work on a fix for it.  One of
the other committers may decide to work on it as well.

If you want out-of-the-box Solr-specific monitoring and are willing to
pay for it, Sematext offers SPM.  One of Sematext's employees is very
active on this list, and they just added Zookeeper monitoring to their
capabilities.  They do have a free version, but it has extremely limited
monitoring history.

http://sematext.com/

Thanks,
Shawn