hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shantian Purkad <>
Subject Skew Join Optimization in hive
Date Tue, 07 Jun 2011 19:31:17 GMT

I have a query which joins 12 different tables (most of them left outer joins) and the query
takes almost 3 hours. 90% of the time is taken by a single reducer. One reducer is getting
bulk of the data to process.

How can I get around this and have fair distribution of data across all reducers? I tried
to enable the skewjoin optimization but getting below NPE after first step of the job is executed.

Any suggestions/ideas will be or great help.


2011-06-07 19:22:28,923 Stage-11 map = 100%,  reduce = 85%
2011-06-07 19:22:30,932 Stage-11 map = 100%,  reduce = 100%
Ended Job = job_201106071542_0010
    at org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(
    at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(
    at org.apache.hadoop.hive.ql.Driver.launchTask(
    at org.apache.hadoop.hive.ql.Driver.execute(
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(
    at org.apache.hadoop.hive.cli.CliDriver.processLine(
    at org.apache.hadoop.hive.cli.CliDriver.main(
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(
    at org.apache.hadoop.util.RunJar.main(
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.ConditionalTask
View raw message