From William Slacum <>
Subject Hive on Tez much slower than MR
Date Wed, 05 Aug 2015 20:48:06 GMT
Hi all,

I'm using Hive 0.14, Tez 0.5.2, and Hadoop 2.6.0.

I have a very simple query of the form `select count(*) from my_table where
x > 0 and x < 1500`.

The table has ~50 columns in it and not all are populated. My total dataset
size is ~20TB. When I run with MapReduce, I can generally see a mapper pull
through ~100k records in a few seconds. The MR job, in total, takes about 2

If all I do is set `hive.execution.engine=tez`, I end up getting a similar
number of Map tasks for Tez, but after 30 minutes or so they aren't
completed. I don't have much insight into what's going on.

I have confirmed the following:

1) Usually about 10 TezChild tasks are executed on a single node.
2) Each one is using greater than 100% CPU, but less than 150% CPU.
3) When I jstack a random task, it's usually generating a
NumberFormatException. The stack trace will be available below, but it
looks like when an expected byte column is null or empty, LazyInteger#parse
throws a NumberFormatException and LazyByte#init swallows it and sets some
default value.
4) The worker will log a record count every time it reaches some power 10.
For the MR tasks, it rips through 100k+ in a few seconds. Tez is taking
5-10 minutes for 10,000 records.

My gut tells me that #3 is my issue (with #4 being a symptom), since in my
experience continual exception creation can be a performance killer.
However, I haven't been able to confirm that the logic for processing a row
is actually different between Tez and MR.

Any thing I should check or try to tweak to get around this?

Here's the stacktrace:

Thread 6127: (state = IN_VM)

- java.lang.Throwable.fillInStackTrace(int) @bci=0 (Compiled frame;
information may be imprecise)

- java.lang.Throwable.fillInStackTrace() @bci=16, line=783 (Compiled frame)

- java.lang.Throwable.<init>(java.lang.String) @bci=24, line=265 (Compiled

- java.lang.Exception.<init>(java.lang.String) @bci=2, line=66 (Compiled

- java.lang.RuntimeException.<init>(java.lang.String) @bci=2, line=62
(Compiled frame)

- java.lang.IllegalArgumentException.<init>(java.lang.String) @bci=2,
line=53 (Compiled frame)

- java.lang.NumberFormatException.<init>(java.lang.String) @bci=2, line=55
(Compiled frame)

- org.apache.hadoop.hive.serde2.lazy.LazyInteger.parseInt(byte[], int, int,
int) @bci=62, line=104 (Compiled frame)

- org.apache.hadoop.hive.serde2.lazy.LazyByte.parseByte(byte[], int, int,
int) @bci=4, line=94 (Compiled frame)

int, int) @bci=15, line=52 (Compiled frame)

@bci=101, line=111 (Compiled frame)

- org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(int)
@bci=6, line=172 (Compiled frame)

org.apache.hadoop.hive.serde2.objectinspector.StructField) @bci=60, line=67
(Compiled frame)

@bci=53, line=394 (Compiled frame)

org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(, @bci=16, line=137
(Compiled frame)

org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx,, @bci=3, line=100
(Compiled frame)

@bci=57, line=492 (Compiled frame)

@bci=20, line=83 (Compiled frame)

- org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord() @bci=40,
line=68 (Compiled frame)

- @bci=9,
line=294 (Compiled frame)

java.util.Map) @bci=224, line=163 (Interpreted frame)

