accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Losing tservers - Unusually high Last Contact times
Date Tue, 20 May 2014 15:19:54 GMT
On 5/20/14, 10:21 AM, thomasa wrote:
> I was worried about how many connections would be open on the larger cloud,
> so I significantly reduced the number of YARN process. Side question: does
> each worker node have a connection with every other node?

Are you referring to the YARN processes or Accumulo processes? For YARN, 
I believe the container will primarily be communicating back to the RM 
for MapReduce, but a custom app could be doing anything.

For Accumulo, mostly, a tserver will be only communicating with the 
master. I know this isn't entirely true, though. For examples, tservers 
will communicate with other tservers as a part of bulk-importing.

If they did, my
> guess was that there would be significantly more open connections on a 150+
> node cloud than a 40 node cloud. For that reason, I only have 2 YARN
> processes with 2gb memory each on the larger cloud that is seeing the
> issues. My thought was that each YARN process needs a core, the tablet
> server needs a core, and OS stuff could probably use a core.

Yes, you should most definitely be leaving headroom on a system for the 
operating system. A core and 1G of RAM is probably a good starting 
point, but YMMV.

To increase the zookeeper timeout, you can try this, but it will have 
other implications, such a failure detection/recovery being slower:

In accumulo-site.xml: set instance.zookeeper.timeout equal to something 
like 45s or 60s (default is 30s as Dave mentioned earlier).

In zoo.cfg: set maxSessionTimeout equal to the above, but in 
milliseconds, e.g. 45000 or 60000.

View raw message