hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: hive.root.logger influencing query plan?? so it's not so
Date Tue, 06 Sep 2016 17:04:40 GMT
> another case of a query hangin' in v2.1.0.

I'm not sure that's a hang. If you can repro this, can you please do a jstack while it is
"hanging" (like a jstack of hiveserver2 or cli)?

I have a theory that you're hitting a slow path in HDFS remote read because of the following

        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(

Notice that it is firing off a 4 byte HDFS read call without buffering - this is probably
because Compression is usually the natural buffering mode for the SequenceFiles.

The uncompressed data might be triggering a 4 byte remote read directly, which would be an
extremely slow way to read data out of HDFS.

> * so empty result expected.

The empty result is the worst-case scenario for the FetchTask optimization, because it means
the CLI tool deserializes every single row in a single thread.

ORC which has internal indexes is somewhat safe against that.

> set hive.fetch.task.conversion=none;
> but not sure its the right thing to set globally just yet.

No, it's not - the right setting is to tune the size threshold for that optimization.


Setting that to <=1G bytes can be a win, while setting that to -1 can cause so much pain.


View raw message