hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "萝卜丝炒饭" <1427357...@qq.com>
Subject Re: No Reducer scenarios
Date Fri, 03 Feb 2017 06:34:33 GMT
HI  Nair,
have you know the class please? I tried to find but failed. I know NewDirectOutputCollector
is used to write tmp files.

From: "☼ R Nair (रविशंकर नायर)"<ravishankar.nair@gmail.com>
Date: 2017/1/30 13:32:04
To: "dev"<dev@spark.apache.org>;"user"<user@hadoop.apache.org>;"user"<user@spark.apache.org>;
Subject: No Reducer scenarios

Dear all,

1) When we don't set the reducer class in driver program, IdentityReducer is invoked.

2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked.

Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx
format) , which shows the map output. But we know that the output of Map is always written
to intermediate local file system. So who/which class is responsible for taking these intermediate
Map outputs from local file system and writes to HDFS ? Does this particular class performs
this write operation only when setNumReduceTasks is set to zero?

Best, Ravion
View raw message