hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kazuki Ohta <kazuki.o...@gmail.com>
Subject massive zk expirations under heavy network load
Date Wed, 20 Apr 2011 18:41:30 GMT

I'm now using CDH3u0 at 16nodes cluster (hdp0-hdp15).
The configuraiton is below.

hdp0: zk + master + region + nn + dn + jt + tt
hdp1: zk + master + region + snn + dn + tt
hdp2: zk + region + dn + tt
hdp3 to hdp15: region + dn + tt

Usually, it works really well. But once the user throws MapReduce
job which requires massive network transfer in the shuffle phase,
the master got the zk session timeout exception, and fails-over to
another master.

The problem is that shuffle network transfer dominates the switch,
and important zk packets are not transferred properly at that time.

Even ganglia monitoring seems to stop at that time. And mr task
attempts also got zk session timeouts and dies altogether (about
100 tasks dies at the same time. input and output are both hbase).

This is the potential problem running MapReduce job alongside
with HBase. Does anyone know any good solution for this

Of course I should isolate hbase-master from task tracker. This
could avoid hbase-master failover problem, but cannot avoid mr
tasks to get zk session expiration all together.


Kazuki Ohta: http://kzk9.net/

View raw message