hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshay Singh <akshay_i...@yahoo.com>
Subject Re: Slow start of HBase operations with YCSB, possibly because of zookeeper ?
Date Fri, 18 Jan 2013 15:55:41 GMT
I found the problem, so I thought I would post it here for future reference.

The problem was IPv6 enabled network. Though IPv6 in HDFS ( HADOOP_OPTS=-Djava.net.preferIPv4Stack=true),
and in HBase ( -Djava.net.preferIPv4Stack=true) was already disabled, but for some of the
machines in cluster IPv6 was not disabled in kernel (through sysctl). 

So hbase was using IPv6 for its services on some of the hosts. So I am guessing at start of
every workload, HBase tries to resolve AAAA records, which eventually times out. And then
it resolves to IPv4 address, and thats when operations start at normal rate. 

On the same note, surprisingly, in one of the host disabling IPv6 through sysctl (persisted
in sysctl.conf) was not enough to discourage HBase to use IPv6 communication. I had to disable
IPv6 in grub (default grub cmdline in /etc/default/grub) on this host. 

After there was *no IPv6 whatsoever* in the cluster, YCSB clients start doing operation on
HBase immediately.

Thanks,
Akshay



________________________________
 From: Akshay Singh <akshay_iiit@yahoo.com>
To: "user@hbase.apache.org" <user@hbase.apache.org> 
Sent: Tuesday, 15 January 2013 10:36 AM
Subject: Re: Slow start of HBase operations with YCSB, possibly because of zookeeper ?
 
Thanks Samar.

You are right YCSB writes data to a single table 'usertable', but I see very slow operations
(in order of 1-2 operations/second) even for read/update workload and not only for inserts.
So, the region is already split in to multiple RS before I start my transaction workload.

And keys are fairly random in YCSB, so I doubt if the slow operations are owing to the fact
that table is initially limited to one region.

To my knowledge this should have something to do with Zookeeper, as (said in the original
mail) if I increase the "hbase.zookeeper.watcher.sync.connected.wait" (to 10 sec) I dont see
the exceptions thrown by ZookeeperWatcher, which I see with default value of 2s. I have a
stand-alone zookeeper instance, to which all RS connects to.

Any other component I should closely monitor ?

Thanks,
Akshay



________________________________
From: samar kumar <samar.opensource@gmail.com>
To: user@hbase.apache.org 
Sent: Tuesday, 15 January 2013 3:58 AM
Subject: Re: Slow start of HBase operations with YCSB, possibly because of zookeeper ?

YCSB would be writing all data to one table.. So initially when the table
is small or just created all the writes would go to one RS.. As the table
grows the Region is split into different RS. The would allow parallel
writes, if the keys are random and could possibly make the writes faster.
Samar

On 15/01/13 6:34 AM, "Akshay Singh" <akshay_iiit@yahoo.com> wrote:

> 
>Hi hbase users,
>
>I am running HBase (on top of HDFS) in
>distributed mode (on 8 VMs), and things like JPS look fine on all the
>machines in the cluster. I am also able to run hbase shell and
>interact with HBase though it. But when I want to benchmark my HBase
>cluster with YCSB (Yahoo! Cloud System Benchmark,
>https://github.com/brianfrankcooper/YCSB/) I see this weird problem
>of slow start of the HBase operations and then picking up later.
>
>Basically when I start the YCSB
>workload from a client machine, I see these problems in chronological
>order :
>
>1) ERROR zookeeper.ZooKeeperWatcher: ZK
>is null on connection event
>
>###########
>ERROR zookeeper.ZooKeeperWatcher: ZK is
>null on connection event -- see stack trace for the stack trace when
>constructor was called on this zkw
>java.lang.Exception: ZKW CONSTRUCTOR
>STACK TRACE FOR DEBUGGING
>at
>org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher
>.java:142)
>at
>org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher
>.java:126)
>at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getZooKeeperWatcher(HConnectionManager.java:1322)
>at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.ensureZookeeperTrackers(HConnectionManager.java:584)
>at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegion(HConnectionManager.java:827)
>at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegion(HConnectionManager.java:810)
>at
>org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:232)
>at
>org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:172)
>at
>org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:131)
>at
>com.yahoo.ycsb.db.HBaseClient.getHTable(HBaseClient.java:155)
>###########
>
>2) org.apache.zookeeper.ClientCnxn -
>Error while calling watcher
>java.lang.NullPointerException: ZK
>is null
>
>############
>ERROR org.apache.zookeeper.ClientCnxn -
>Error while calling watcher
>java.lang.NullPointerException: ZK is
>null
>at
>org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeep
>erWatcher.java:334)
>at
>org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatche
>r.java:271)
>at
>org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:5
>21)
>at
>org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
>############
>
>3) And then finally it starts the
>operation on HBase (which means Zookeeper is running fine and can be
>connected to )
>
>4) The operations remains below 10
>ops/sec for first 60-70 sec, and then grow gradually to reach aroun
>1300 ops/sec (normally expected number)
>
>Here are the actual logs :: http://pastebin.com/NC1zKwRF
>
>I am running
>1) Hadoop-1.0.1
>2) HBase-0.94.1
>3) Zookeeper-3.3.6
>4) Java 1.6.0_24 (openJDK-6)
>5) OS : Ubuntu-11.10
>6) YCSB-0.14
>
>What I have already tried :
>
>1) Checked my DNS setting (just to be
>sure .. using synced /etc/hosts file) .. no luck
>2) Increasing
>"hbase.zookeeper.watcher.sync.connected.wait" to 10000
>(default:2000), this get rid of "ZK is null ****" errors,
>but slow start is still the issue with no improvement.
>
>I am clueless as to what may be the
>reason behind this 'slowly picking up' behavior of my set-up.
>Please advise.
>
>Thanks,
>Akshay
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message