hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Quinn <squ...@moxiegroup.com>
Subject Possible ZooKeeper Connection Leak in TableOutputFormat
Date Tue, 24 Jan 2012 17:15:40 GMT

Our application runs Map/Reduce tasks fairly frequently against HBase
(Cloudera distribution 0.90.4), and we're making using of the default
org.apache.hadoop.hbase.mapreduce.TableOutputFormat class for the reduce
step which the TableMapReduceUtil.initTableReducerJob() sets up.  We invoke
the Map/Reduce tasks via the standard Hadoop Job API, but they're all
triggered from the same virtual machine that stays running (so we aren't
shutting down the virtual machine after each job runs).  We've been
noticing that we've been running out of ZooKeeper connections in this
configuration, and believe we've tracked the "leak" down to the
TableOutputFormat class.  Specifically, that class does the following:

  public void setConf(Configuration otherConf) {
    this.conf = HBaseConfiguration.create(otherConf);
    String tableName = this.conf.get(OUTPUT_TABLE);
    String address = this.conf.get(QUORUM_ADDRESS);
    String serverClass = this.conf.get(REGION_SERVER_CLASS);
    String serverImpl = this.conf.get(REGION_SERVER_IMPL);
    try {
      if (address != null) {
        ZKUtil.applyClusterKeyToConf(this.conf, address);
      if (serverClass != null) {
        this.conf.set(HConstants.REGION_SERVER_CLASS, serverClass);
        this.conf.set(HConstants.REGION_SERVER_IMPL, serverImpl);
      this.table = new HTable(this.conf, tableName);
      LOG.info("Created table instance for "  + tableName);
    } catch(IOException e) {

I believe in previous releases of HBase this was different, but at some
point the code to clone the configuration object (first line of that
method) was added.  Then, in that same method when that code creates the
HTable instance, internally the HTable gets a new connection to ZooKeeper
everytime (since the configuration instance is different.)

I believe I can get around this in my application by creating a custom
TableOutputFormat.  However, can anyone confirm if this is indeed a
problem, or if there is some other way to work around the default
TableOutputFormat class creating a new connection to ZooKeeper every time
it runs?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message