accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: Losing tservers - Unusually high Last Contact times
Date Tue, 20 May 2014 01:20:05 GMT
On Mon, May 19, 2014 at 6:56 PM, <> wrote:

> You are hitting the zookeeper timeout, default 30s I believe. You said you
> are not oversubscribed for memory, but what about CPU? Are you running YARN
> processes on the same nodes as the tablet servers? Is the tablet server
> being pushed into swap or starved of CPU?

Also check on the zookeeper server nodes.  Is Java GC pausing tservers or
zookeeper servers?

> -----Original Message-----
> From: thomasa []
> Sent: Monday, May 19, 2014 4:22 PM
> To:
> Subject: Losing tservers - Unusually high Last Contact times
> Hello all,
> I am having issues with tablet servers going down due to poor contact times
> (my hypothesis at least). In the past I have had stability success with
> smaller clouds (20-40 nodes), but have run into issues with a larger number
> of nodes (150+). Each node is a datanode, nodemanger, and tablet server.
> There is a master node that is running the hadoop namenode, hadoop resource
> manager and accumulo master, monitor, etc. There are three zookeeper nodes.
> All nodes are vms. This same setup is used on the smaller, stable clouds as
> well.
> I do not believe memory allocation is an issue as I have only given
> hadoop/yarn (2.2.0) and accumulo (1.5.1) less than half of the available
> memory. The FATAL errors I have seen are:
> Lost tablet server lock (resaon = SESSION_EXPIRED), exiting
> Lost ability to monitor tablet server lock, exiting
> Other than bumping up rpc timeout (which I have done but would rather not
> do
> that and find the root cause of the problem), I have run out of ideas on
> how
> to solve this issue.
> Does anyone have any insight into why I would be seeing such bad response
> times between nodes? Are there any configuration parameters I can play with
> to fix this?
> I realize this is a very general question, so let me know if there is any
> information I can provide to help clarify the issue.
> Thank you in advance for your time.
> Thomas
> --
> View this message in context:
> Last-Contact-times-tp9950.html
> Sent from the Users mailing list archive at

View raw message