accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slater, David M." <David.Sla...@jhuapl.edu>
Subject RE: Performance during node failure
Date Fri, 08 Nov 2013 22:27:06 GMT
That makes a lot of sense.

This only happened when the nodes were getting pounded, so it likely was a jvm swap.

From: Eric Newton [mailto:eric.newton@gmail.com]
Sent: Friday, November 08, 2013 3:24 PM
To: user@accumulo.apache.org
Subject: Re: Performance during node failure

I currently have zookeeper running on all 7 data nodes

If you ever grow your cluster, you shouldn't keep running more zookeepers.
Adding zookeepers slows down zookeeper writes.


with the batchwriters running on the name node. Basically, I was getting a number of the following:
client session timed out ...
opening socket connection
socket connection established
session establishment complete
...
client session timed out ...
repeat

I would also occasionally get
session expired for /accumulo/fe7...
as well as
Zookeper.KeeperException$Connectionloss
Exception: KeeperErrorCode = Connectionloss
for /accumulo/f37.../tables/3b/state
at accumulo.core.zookeeper.ZooCache$2.run
accumulo.core.zookeeper.ZooCache.retry
accumulo.core.zookeeper.ZooCach.get
core.clientimpl.tables.getTableState
core.clientimpl.multiTableBatchWriter.getBatchWriter
myIngestorProcess.run



Does anyone know if this is an Accumulo problem, a Zookeeper problem, or something else (network
overly busy, etc.)?


This happens when:

1) a jvm swaps out
2) a jvm does stop-the-world garbage collection
3) there is a network disconnect/interruption

By far, the biggest reason for a lost zookeeper session is that either the tablet server or
the zookeeper process have been pushed into swap.

Make sure that swappiness is set to zero, that you have ample memory for all your processes,
and set the env variable MALLOC_ARENA_MAX to 1:

export MALLOC_ARENA_MAX=1



Mime
View raw message