Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CBC3817EB0 for ; Sat, 12 Sep 2015 00:41:21 +0000 (UTC) Received: (qmail 32947 invoked by uid 500); 12 Sep 2015 00:41:20 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 32874 invoked by uid 500); 12 Sep 2015 00:41:19 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 32864 invoked by uid 99); 12 Sep 2015 00:41:19 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Sep 2015 00:41:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 5E96DC0252 for ; Sat, 12 Sep 2015 00:41:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id W4TiJjx-8epB for ; Sat, 12 Sep 2015 00:41:18 +0000 (UTC) Received: from mail-ig0-f177.google.com (mail-ig0-f177.google.com [209.85.213.177]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 4AA2942B13 for ; Sat, 12 Sep 2015 00:41:18 +0000 (UTC) Received: by igbkq10 with SMTP id kq10so56841699igb.0 for ; Fri, 11 Sep 2015 17:41:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=QKLEzgiO3zYO5/0Ir+pQoydB0lyPBDNjJn9n1yAqAfs=; b=Yp1TG7yCNZS4tq7/pazDhlqd8UZ791Gn1jIWFE1JLXn2fKPORRI0uiyNcBI63dia9/ vuzHFwAS3G7CyIU4BQM9JxeME8K9sNrcbeWiEeHEDk+XrNLAXd2gfYiecEuoqQSBDhhw Vl9WIBCbiq95bVC9GbfYoCUjPIZKwA+mw26JTCogS1RGsMqnqYQqxt4jz3advoGEuCTp Tk2Wejw7x8r9oEri46NrDLUiER3lneR7KzATWHrCIrc0whKxMQkr4pIQnvufIKNOyk4n bQdKo7DklubwEQW+bQOKPRlBXXNpfkyAt6PFxKmUg4ZR7ieXu8HPXS0QfLVvAoXKPSSm eWLQ== MIME-Version: 1.0 X-Received: by 10.50.79.197 with SMTP id l5mr1233386igx.93.1442018477951; Fri, 11 Sep 2015 17:41:17 -0700 (PDT) Received: by 10.107.55.137 with HTTP; Fri, 11 Sep 2015 17:41:17 -0700 (PDT) Date: Fri, 11 Sep 2015 17:41:17 -0700 Message-ID: Subject: Checking the number of Readers From: James Pirz To: user@hive.apache.org Content-Type: multipart/alternative; boundary=089e013cc168235137051f821732 --089e013cc168235137051f821732 Content-Type: text/plain; charset=UTF-8 I am using Hive 1.2.0 on Hadoop 2.6 (on a cluster with 10 machines) and I am trying to understand the performance of a full-table scan. I am running the following query: SELECT * FROM LINEITEM WHERE L_LINENUMBER < 0; and I am measuring its performance in different scenarios: using "MR vs. Tez" and with different table types/formats (an external table on text data, or ORC). My question is: What is the best way to check the number of readers (scanners) that Hive uses in parallel to read the data ? My data is in HDFS and on each node I have 1 datanode process running which writes its blocks into 3 separate paths (each path persists its data on a separate disk). I tried to get this info using "explain" or from the available consoles, but I could not find that. Checking the number of established connections to the data transfer port for datanode (using the command below) gives me 12, but I am not sure If I am looking at the correct metric: netstat -anp | grep -w 50010 | grep ESTABLISHED | wc -l Any help would be appreciated. Thnx --089e013cc168235137051f821732 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I am using Hive 1.2.0 on Hadoop 2.6 (on a cluster with 10 = machines) and I am trying to understand the performance of a full-table sca= n. I am running the following query:

SELECT * FROM LINEI= TEM=C2=A0
WHERE L_LINENUMBER < 0;

and= I am measuring its performance in different scenarios: using "MR vs. = Tez" and =C2=A0with different table types/formats (an external table o= n text data, or ORC).

My question is:
Wh= at is the best way to check the number of readers (scanners) that Hive uses= in parallel to read the data ?=C2=A0

My data is i= n HDFS and on each node I have 1 datanode process running which writes its = blocks into 3 separate paths (each path persists its data on a separate dis= k).

I tried to get this info using "explain&q= uot; or from the available consoles, but I could not find that. Checking th= e number of established connections to the data transfer port for datanode = (using the command below) gives me 12, but I am not sure If I am looking at= the correct metric:

netstat -anp | grep -w 50010 | = grep ESTABLISHED | wc -l


= Any help would be appreciated.

Thnx
--089e013cc168235137051f821732--