hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupinder Singh <>
Subject Unexplained ganglia metric on hbase-hive cluster
Date Fri, 12 Apr 2013 05:49:14 GMT

I am trying to tune the performance of my cluster. The cluster is hosted on Amazon EMR. There
are 2 separate clusters - 1 for HBase and 1 for Hive. Hive cluster has no persistent data,
it provides only processing power.
The test cluster from which I have generated the metrics, is 4 large nodes for Hbase (1 master
+ 3 core) and 4 large nodes for Hive (1 master + 3 core).
The process that is being monitored does this:

1.       New data files are received by the Hive cluster

2.       Hive inserts new data into Hbase

3.       Hive cluster then executes a bunch of hql on the hbase table to generate analytics.
Size of data: HBase table has 10 million rows of about 1K each.

I have attached Ganglia snapshots for this process from both Hive and HBase clusters. What
is puzzling is:

1.       On the Cluster Network graph on Hbase, both In and Out lines follow each other closely.
This is strange since after the initial insert, Hive is only selecting data from HBase table,
so I would expect a lot of Out but nothing in In.

2.       The Cluster Network graph on Hbase shows 80MB/s Out on peaks, but the corresponding
peaks on Hive's Cluster Network show only 10MB/s as In. Why is there such a significant difference
between the data being sent out by HBase vs data being received by Hive, shouldn't they match

Any help or pointers are highly appreciated.

Also uploaded the metrics graphs here:


This email is intended for the person(s) to whom it is addressed and may contain information
that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure
by any person other than the addressee(s) is strictly prohibited. If you have received this
email in error, please notify the sender immediately by return email and delete the message
and any attachments from your system.
View raw message