zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akmal Abbasov <akmal.abba...@icloud.com>
Subject Re: Transaction timeouts
Date Wed, 18 Nov 2015 10:37:01 GMT

> On 17 Nov 2015, at 21:34, Raúl Gutiérrez Segalés <rgs@itevenworks.net> wrote:
> 
> On 17 November 2015 at 12:13, Akmal Abbasov <akmal.abbasov@icloud.com>
> wrote:
> 
>> Hi Raul,
>> Thank you for your response.
>> I am running zookeeper with -Xms512m -Xmx1g options, is this enough.
>> 
> 
> It depends on your workload.. how many writes/read per sec are you
> expecting/seeing? Are you seeing long
> GC pauses? If so, you'll need more mem or bigger tick times, otherwise
> you'll miss the deadlines for the
> pings (both among learners and to clients…)
> 
Where I can find this information, in fact information regarding read/writes. 
This is the output of the stat command
Server 1
Latency min/avg/max: 0/66/5212
Received: 8722
Sent: 8694
Connections: 19
Outstanding: 0
Zxid: 0xa9600002ef2
Mode: follower
Node count: 479

Server 2 
Latency min/avg/max: 0/70/5252
Received: 8228
Sent: 8203
Connections: 16
Outstanding: 0
Zxid: 0xa9600002e12
Mode: leader
Node count: 479

Server 3
Latency min/avg/max: 0/0/1
Received: 140
Sent: 139
Connections: 2
Outstanding: 0
Zxid: 0xa9600002bf8
Mode: follower
Node count: 479

All the servers have the same configs. 
Is -Xms512m -Xmx1g enough to handle my workload.
Moreover I see that the load is not evenly distributed. Is it something that should be tuned
manually,
or there is something like hbase/hdfs balancer, which will take care of this?

> 
>> Regarding the network, all of the server zk server nodes are hosted in the
>> cloud, in the same dc.
>> But according to the zk troubleshooting guide, the timeout should be
>> increased for cloud environments.
>> 
> 
> Yup, latency can be unpredictable in the cloud…
> 
> 
>> One more thing is that, I’m seeing a lot of
>> fsync-ing the write ahead log in SyncThread:1 took 2962ms which will
>> adversely effect operation latency. See the ZooKeeper troubleshooting guide
>> messages in the logs.
>> 
> 
> That definitely looks bad and will block everything else. What type of disc
> are you writing your logs and snapshots to? Are they
> separate volumes?
I’m using separate disk for both logs and data. But they’re hdd, not ssd. 
So my assumption 

I’ve tried to understand what actually is happening, here is the summary of the logs
08:22:08,201	Transaction timeout
08:22:08,596 - 08:22:25,441	ZookeeperServer not running
08:22:24,927	New election
Everything is starting from ’Transaction timeout’ in leader, which caused ‘Exception
when following the leader’ in learners.
Then all zookeeper processes are shutting down. New election is happening and zookeeper processes
are starting. 

And one more thing, what’s the best way to update the configs without downtime.
Thank you.

Regards, Akmal

	


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message