Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E9800200B7E for ; Tue, 6 Sep 2016 19:04:44 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E832C160ACB; Tue, 6 Sep 2016 17:04:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 395DC160AA9 for ; Tue, 6 Sep 2016 19:04:44 +0200 (CEST) Received: (qmail 16898 invoked by uid 500); 6 Sep 2016 17:04:42 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 16888 invoked by uid 99); 6 Sep 2016 17:04:42 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Sep 2016 17:04:42 +0000 Received: from [192.168.2.108] (adsl-71-145-210-73.dsl.austtx.sbcglobal.net [71.145.210.73]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 8F8CC1A003E for ; Tue, 6 Sep 2016 17:04:42 +0000 (UTC) User-Agent: Microsoft-MacOutlook/f.19.0.160817 Date: Tue, 06 Sep 2016 10:04:40 -0700 Subject: Re: hive.root.logger influencing query plan?? so it's not so From: Gopal Vijayaraghavan Sender: Gopal Vijayaraghavan To: "user@hive.apache.org" Message-ID: Thread-Topic: hive.root.logger influencing query plan?? so it's not so References: <1472614156.49580.ezmlm@hive.apache.org> In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="UTF-8" Content-transfer-encoding: 7bit archived-at: Tue, 06 Sep 2016 17:04:45 -0000 > another case of a query hangin' in v2.1.0. I'm not sure that's a hang. If you can repro this, can you please do a jstack while it is "hanging" (like a jstack of hiveserver2 or cli)? I have a theory that you're hitting a slow path in HDFS remote read because of the following stacktrace. at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:700) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:2101) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2508) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:484) Notice that it is firing off a 4 byte HDFS read call without buffering - this is probably because Compression is usually the natural buffering mode for the SequenceFiles. The uncompressed data might be triggering a 4 byte remote read directly, which would be an extremely slow way to read data out of HDFS. > * so empty result expected. The empty result is the worst-case scenario for the FetchTask optimization, because it means the CLI tool deserializes every single row in a single thread. ORC which has internal indexes is somewhat safe against that. > set hive.fetch.task.conversion=none; > but not sure its the right thing to set globally just yet. No, it's not - the right setting is to tune the size threshold for that optimization. hive.fetch.task.conversion.threshold; Setting that to <=1G bytes can be a win, while setting that to -1 can cause so much pain. Cheers, Gopal