hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jameson Li <hovlj...@gmail.com>
Subject why dfsadmin report a dead DT, but this DT is in normal state
Date Tue, 15 Feb 2011 09:36:37 GMT
Hi ,

Our cluster sometimes is busy, and some of the slave nodes(DT, TT,
regionserver and zookeeper.HQuorumPeer on every node) is in high-load
state.

Today when I see the NN browser report(dfshealth.jsp), I found a dead
DT. But when I login this node, I found that everything seems normal
in this DT.
And the same time the JT can touch this TT, the NN can't touch this
DT(marked it as dead), the hbase can't touch this regionserver, and
the ganglia shows this server is down.

After a while, the ganglia and the JT shows this DT server is in
normal state, but the NN and Hbase master can't . And all of the time
I can login this DT server.

I think when someone commits a big job, one of the DT is in so
high-load state that the NN had not received the heartbeat package
which is send by this DT. After a while, though this DT is in normal
state and low-load, the NN can't receive the heartbeat package send by
this DT.

I don't know why.

Can someone help me?

Thanks,
Jameson Li.

Mime
View raw message