hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Morel" <david.mo...@amakuru.net>
Subject Skew join failure
Date Fri, 30 Nov 2012 10:10:47 GMT
Hi,

I am trying to solve the "last reducer hangs because of GC because of 
truckloads of data" issue that I have on some queries, by using SET 
hive.optimize.skewjoin=true; Unfortunately, every time I try this, I 
encounter an error of the form:
...
2012-11-30 10:42:39,181 Stage-10 map = 100%,  reduce = 100%, Cumulative 
CPU 406984.1 sec
MapReduce Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds 
100 msec
Ended Job = job_201211281801_0463
java.io.FileNotFoundException: File 
hdfs://nameservice1/tmp/hive-dmorel/hive_2012-11-30_09-23-00_375_8178040921995939301/-mr-10014/hive_skew_join_bigkeys_0

does not exist.
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:365)
         at 
org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:96)
         at 
org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
         at 
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
         at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
         at 
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
...

Googling didn't give me any indication on how to debug/solve this, so 
I'd be glad if I could get any indication where to start looking.

I'm using CMF4.0 currently, so Hive 0.8.1.

Thanks a lot!

David Morel

Mime
View raw message