hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long
Date Fri, 20 Oct 2017 04:45:55 GMT
> . I didn't see data skew for that reducer. It has similar amount of REDUCE_INPUT_RECORDS
as other reducers.
…
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 8000 rows for join key
[4092813312923569]


The ratio of REDUCE_INPUT_RECORDS and REDUCE_INPUT_GROUPS is what is relevant.

 

The row containers being spilled to disk means that at least 1 key in the join has > 10000
values.

If you have Tez, this comes up when you run the SkewAnalyzer.

https://github.com/apache/tez/blob/master/tez-tools/analyzers/job-analyzer/src/main/java/org/apache/tez/analyzer/plugins/SkewAnalyzer.java#L41

 

Cheers,

Gopal


Mime
View raw message