hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: hive.root.logger influencing query plan?? so it's not so
Date Tue, 06 Sep 2016 17:04:40 GMT
> another case of a query hangin' in v2.1.0.

I'm not sure that's a hang. If you can repro this, can you please do a jstack while it is
"hanging" (like a jstack of hiveserver2 or cli)?

I have a theory that you're hitting a slow path in HDFS remote read because of the following
stacktrace.

        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:700)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:2101)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2508)
        at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:484)

Notice that it is firing off a 4 byte HDFS read call without buffering - this is probably
because Compression is usually the natural buffering mode for the SequenceFiles.

The uncompressed data might be triggering a 4 byte remote read directly, which would be an
extremely slow way to read data out of HDFS.

> * so empty result expected.

The empty result is the worst-case scenario for the FetchTask optimization, because it means
the CLI tool deserializes every single row in a single thread.

ORC which has internal indexes is somewhat safe against that.

> set hive.fetch.task.conversion=none;
> but not sure its the right thing to set globally just yet.

No, it's not - the right setting is to tune the size threshold for that optimization.

hive.fetch.task.conversion.threshold;

Setting that to <=1G bytes can be a win, while setting that to -1 can cause so much pain.

Cheers,
Gopal





Mime
View raw message