hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject Re: replicating to oneself?
Date Fri, 01 Nov 2013 18:12:12 GMT
Himanshu and Nick,

many thanks for your help.  I don't have all the answers to Nick's
questions, since the deployment is built by another team and combined with
a lot of other components like zookeeper, hadoop, hbase, hive, oozie, etc.

I followed Himanshu's suggestion and checked the hbase.id on two different
problematic cluster, they are different. So seems normal to me. About the
deployment. I did clean install(well, at least that is my intention), and
not re-using existing znodes. The installation is to stop
everything(zookeeper, hadoop, hbase, etc), remove all the files and data;
then install everything. so should be nothing left over.

Let me describe current setup and my investigation so far. Rows can be
replicated from the correct cluster to problematic cluster, but can't be
replicated from the problematic one EVEN with both have the same hbase.jar.

** Problematic Cluster: *
name = bdvm134
/hbase/hbase.id =  $b13a0e3a-2bec-4e13-8b1d-043aa1a66443
> list_peers  (I put two there just for debug purpose)
 PEER_ID CLUSTER_KEY STATE
 6 hdtest014.svl.ibm.com:2181:/hbase ENABLED
 7 hdtest014.svl.ibm.com:2181:/hbase ENABLED


** Correct Cluster: *
name = hdtest014
/hbase/hbase.id = ce41a00d-5b0c-44b2-8bf7-bfd35bda1d42
> list_peers
 PEER_ID CLUSTER_KEY STATE
 1 bdvm134.svl.ibm.com:2181:/hbase ENABLED


I injected some debugging code into ReplicationSource.run()
public void run() {
  ....
    LOG.info("Replicating "+clusterId + " -> " + peerClusterId);

    Map<String, ReplicationPeer> peerList = zkHelper.getPeerClusters();

    for (Map.Entry<String, ReplicationPeer> peer : peerList.entrySet()) {
      LOG.info("Demai ---------------begin");
      String peerId_A = peer.getKey();
      ReplicationPeer rPeer = peer.getValue();
      try {
        LOG.info("clusterUUId = " + zkHelper.getUUIDForCluster(
zkHelper.getZookeeperWatcher()));
        LOG.info("peerUUID = " + zkHelper.getPeerUUID(peerId_A));
      } catch (KeeperException e) {
        LOG.info("exception = " + e);
      }

      LOG.info("peerID = " + peerId_A);
      LOG.info("peer Value=" + rPeer.toString());

      List<ServerName> sList = zkHelper.getSlavesAddresses(peerId_A);
      for (ServerName sName : sList) {
        LOG.info("sName = " + sName.getHostname()); *// value incorrect on
problematic cluster*
      }
      LOG.info("Peer Cluster=" + rPeer.getClusterKey() + ",Peer ID = " +
rPeer.getId());
      LOG.info("Demai ---------------end");
    }
...
}



on bdvm134- regionserver:
2013-11-01 10:20:44,757 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication bdvm134.svl.ibm.com%2C60020%2C1383324585548.1383324589592
at 3073
2013-11-01 10:20:44,761 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Replicating b13a0e3a-2bec-4e13-8b1d-043aa1a66443 ->
b13a0e3a-2bec-4e13-8b1d-043aa1a66443
2013-11-01 10:20:44,761 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Demai
---------------begin
2013-11-01 10:20:44,773 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
clusterUUId = b13a0e3a-2bec-4e13-8b1d-043aa1a66443
2013-11-01 10:20:44,777 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
peerUUID = b13a0e3a-2bec-4e13-8b1d-043aa1a66443
2013-11-01 10:20:44,777 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: peerID
= 6
2013-11-01 10:20:44,777 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: peer
Value=org.apache.hadoop.hbase.replication.ReplicationPeer@33bb33bb
2013-11-01 10:20:44,779 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: sName =
bdvm134.svl.ibm.com
2013-11-01 10:20:44,779 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Peer
Cluster=6,Peer ID = hdtest014.svl.ibm.com:2181:/hbase
2013-11-01 10:20:44,779 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Demai
---------------end
2013-11-01 10:20:44,779 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Demai
---------------begin
2013-11-01 10:20:44,786 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
clusterUUId = b13a0e3a-2bec-4e13-8b1d-043aa1a66443
2013-11-01 10:20:44,790 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
peerUUID = b13a0e3a-2bec-4e13-8b1d-043aa1a66443
2013-11-01 10:20:44,790 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: peerID
= 7
2013-11-01 10:20:44,790 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: peer
Value=org.apache.hadoop.hbase.replication.ReplicationPeer@710071
2013-11-01 10:20:44,792 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: sName =
*bdvm134.svl.ibm.com*
2013-11-01 10:20:44,792 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Peer
Cluster=7,Peer ID = *hdtest014.svl.ibm.com*:2181:/hbase
2013-11-01 10:20:44,792 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Demai
---------------end
2013-11-01 10:20:44,794 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication bdvm134.svl.ibm.com%2C60020%2C1383324585548.1383324589592
at 3073


on hdtest014 regionsever:
2013-11-01 10:25:01,260 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Replicating ce41a00d-5b0c-44b2-8bf7-bfd35bda1d42 ->
b13a0e3a-2bec-4e13-8b1d-043aa1a66443
2013-11-01 10:25:01,260 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Demai
---------------begin
2013-11-01 10:25:01,263 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
clusterUUId = ce41a00d-5b0c-44b2-8bf7-bfd35bda1d42
2013-11-01 10:25:01,279 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
peerUUID = b13a0e3a-2bec-4e13-8b1d-043aa1a66443
2013-11-01 10:25:01,279 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: peerID
= 1
2013-11-01 10:25:01,279 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: peer
Value=org.apache.hadoop.hbase.replication.ReplicationPeer@70897089
2013-11-01 10:25:01,281 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: sName =
*bdvm134.svl.ibm.com*
2013-11-01 10:25:01,281 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Peer
Cluster=1,Peer ID = *bdvm134.svl.ibm.com*:2181:/hbase
2013-11-01 10:25:01,281 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Demai
---------------end



On Fri, Nov 1, 2013 at 10:07 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> Are you re-deploying over an existing installation? Is it your intention to
> preserve data between deployments or are you running in a testing
> environment? Are you clearing ZK as part of deploying a fresh cluster or
> are you re-using existing znodes? How did you configure replication in the
> shell? Can you provide those commands? I'd request debug logs from
> o.a.h.h.regionserver.Replication but i don't see much logging in there
> anyway.
>
> Basically, can you repro this in a fresh deployment? As Himanshu points
> out, I'm suspect of stale configuration hanging around.
>
>
> On Thu, Oct 31, 2013 at 8:02 PM, Demai Ni <nidmgg@gmail.com> wrote:
>
> > Nick,
> >
> > thanks for looking into this problem. I attached the hbase-site.xml in
> > this email. Just like to point out that I have to tear down the cluster I
> > posted the original log. so the hbase-site.xml is from another
> > cluster(single-node) with the same problem.
> >
> > BTW, I did some investigation this afternoon and don't  think this is a
> > problem within hbase code. (background: I am working within a software
> > team, and quite a few engineers change hbase, hadoop, and other codes
> > everyday)I tried out several different installations, and found out a
> week
> > ago's build with today's hbase build work just fine; but today's build
> with
> > last week's hbase doesn't. Our build includes hadoop 2, which can
> introduce
> > something problematic.
> >
> > wondering how hbase generate UUID? maybe that is something I should look
> > into? thanks
> >
> > Demai
> >
> >
> >
> >
> >
> > On Thu, Oct 31, 2013 at 6:20 PM, Nick Dimiduk <ndimiduk@gmail.com>
> wrote:
> >
> >> Can you post your replication settings from hbase-site.xml?
> >>
> >> On Thursday, October 31, 2013, Demai Ni wrote:
> >>
> >> > hi, folks,
> >> >
> >> > I got a strange thing happening on my cluster(hbase 0.94.9) recently.
> I
> >> am
> >> > setting up a new cluster for replication, and didn't see the data
> being
> >> > replicated over the peer. Then, I found the following in the log of
> the
> >> > regionserver of the Master:
> >> >
> >> > 2013-10-31 13:33:03,293 INFO org.apache.hadoop.hbase.metrics: new
> >> MBeanInfo
> >> > 2013-10-31 13:33:03,300 INFO
> >> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> >> Getting
> >> > 1 rs from peer cluster # 3
> >> > 2013-10-31 13:33:03,300 INFO
> >> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> >> > Choosing peer hdtest018.svl.ibm.com,60020,1383251582072
> >> > 2013-10-31 13:33:03,302 INFO
> >> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> >> > Replicating *b520de1d-3a18-4aec-bd45-de000e81417d* -> *
> >> > b520de1d-3a18-4aec-bd45-de000e81417d*
> >> >
> >> > the log is from ReplicationSource:
> >> > *LOG.info("Replicating "+clusterId + " -> " + peerClusterId);*
> >> >
> >> > It seems the problematic cluster is replicating to itself.
> >> > Any suggestion about how to look into this problem? Many thanks
> >> >
> >> > BTW, I can replicate from another cluster to this problematic one.
> >> >
> >> > Demai
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message