hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Xu <wenhao...@gmail.com>
Subject Will different Bcast address impact the communication between the cluster?
Date Sat, 14 May 2011 11:39:50 GMT
Dear all,

We have met problem with hbase these days after a network update. Basically,
the behavior is that after 3-4 hours of the cluster startup. Some of the
RegionServer try to find the data from a deleted block.

And if we restarted the cluster, the problem just went away, and the data is
not missing.

The detail description of the problem could be found at
http://search-hadoop.com/m/ZpgJ623GoyU1/.META.+inconsistency&subj=The+META+data+inconsistency+issue

I just found some doubt issues in the network configuration of our cluster.
I found some of the cluster node has different broadcast address and Mask
comparing to other nodes, for example, as the following, the hadoopsh11092
use Bcast for 10.255.255.255 and Mask 255.0.0.0, and hadoopsh11103 use Bcast
for 10.0.2.255 and Mask 255.255.255.0

hadoopsh11092
eth0      Link encap:Ethernet  HWaddr 00:A0:D1:EE:C1:7C
          inet addr:10.0.2.19  Bcast:10.255.255.255  Mask:255.0.0.0
          inet6 addr: fe80::2a0:d1ff:feee:c17c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1864321949 errors:0 dropped:1465 overruns:0 frame:0
          TX packets:1867202791 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1811900116811 (1.6 TiB)  TX bytes:1879509303203 (1.7 TiB)
          Memory:face0000-fad00000


hadoopsh11103
eth0      Link encap:Ethernet  HWaddr 00:A0:D1:EE:AE:C4
          inet addr:10.0.2.30  Bcast:10.0.2.255  Mask:255.255.255.0
          inet6 addr: fe80::2a0:d1ff:feee:aec4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1726779928 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1716762766 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1804202744690 (1.6 TiB)  TX bytes:1824085255121 (1.6 TiB)
          Memory:face0000-fad00000

But with these settings, we could have the cluster startup successfully and
the cluster works pretty fine after startup, the problem comes after 3-4
hours. And I could connect to different machine by SSH with their hosts name
correctly.

I knew that Zookeeper has some kind of broadcast during communication. I am
wondering if our settings should work, or it should be the root cause of our
problem?

Thanks in advance.

Best wishes,
Stanley Xu

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message