hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Pirz <james.p...@gmail.com>
Subject Checking the number of Readers
Date Sat, 12 Sep 2015 00:41:17 GMT
I am using Hive 1.2.0 on Hadoop 2.6 (on a cluster with 10 machines) and I
am trying to understand the performance of a full-table scan. I am running
the following query:

SELECT * FROM LINEITEM
WHERE L_LINENUMBER < 0;

and I am measuring its performance in different scenarios: using "MR vs.
Tez" and  with different table types/formats (an external table on text
data, or ORC).

My question is:
What is the best way to check the number of readers (scanners) that Hive
uses in parallel to read the data ?

My data is in HDFS and on each node I have 1 datanode process running which
writes its blocks into 3 separate paths (each path persists its data on a
separate disk).

I tried to get this info using "explain" or from the available consoles,
but I could not find that. Checking the number of established connections
to the data transfer port for datanode (using the command below) gives me
12, but I am not sure If I am looking at the correct metric:

netstat -anp | grep -w 50010 | grep ESTABLISHED | wc -l


Any help would be appreciated.

Thnx

Mime
View raw message