hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naveen Gangam (JIRA)" <>
Subject [jira] [Created] (HIVE-13527) Using deprecated APIs in HBase client causes zookeeper connection leaks.
Date Fri, 15 Apr 2016 19:06:25 GMT
Naveen Gangam created HIVE-13527:

             Summary: Using deprecated APIs in HBase client causes zookeeper connection leaks.
                 Key: HIVE-13527
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
    Affects Versions: 1.1.0
            Reporter: Naveen Gangam
            Assignee: Naveen Gangam

When running queries against hbase-backed hive tables, the following log messages are seen
in the HS2 log.
2016-04-11 07:25:23,657 WARN org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: You are
using an HTable instance that relies on an HBase-managed Connection. This is usually due to
directly creating an HTable, which is deprecated. Instead, you should create a Connection
object and then request a Table instance from it. If you don't need the Table instance for
your own use, you should instead use the TableInputFormatBase.initalizeTable method directly.
2016-04-11 07:25:23,658 INFO org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Creating
an additional unmanaged connection because user provided one can't be used for administrative
actions. We'll close it when we close out the table.

In a HS2 log file, there are 1366 zookeeper connections established but only a small fraction
of them were closed. So lsof would show 1300+ open TCP connections to Zookeeper.
grep "org.apache.zookeeper.ClientCnxn: Session establishment complete on server" * |wc -l
grep "INFO org.apache.zookeeper.ZooKeeper: Session:" * |grep closed |wc -l

According to the comments in TableInputFormatBase, the recommended means for subclasses like
HiveHBaseTableInputFormat is to call initializeTable() instead of setHTable() that it currently
Subclasses MUST ensure initializeTable(Connection, TableName) is called for an instance to
function properly. Each of the entry points to this class used by the MapReduce framework,
{@link #createRecordReader(InputSplit, TaskAttemptContext)} and {@link #getSplits(JobContext)},
will call {@link #initialize(JobContext)} as a convenient centralized location to handle retrieving
the necessary configuration information. If your subclass overrides either of these methods,
either call the parent version or call initialize yourself.

Currently setHTable() also creates an additional Admin connection, even though it is not needed.

So the use of deprecated APIs are to be replaced.

This message was sent by Atlassian JIRA

View raw message