hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Mothkuri <Praveen.Mothk...@kpit.com>
Subject RE: No Reducer scenarios
Date Fri, 03 Feb 2017 11:46:25 GMT
In this case the output of the map-tasks directly go to distributed file-system, to the path
set by FileOutputFormat.setOutputPath(JobConf, Path)<https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#setOutputPath(org.apache.hadoop.mapred.JobConf,%20org.apache.hadoop.fs.Path)>.
Also, the framework doesn't sort the map-outputs before writing it out to HDFS.
From: Praveen Mothkuri
Sent: Friday, February 03, 2017 5:14 PM
To: '萝卜丝炒饭'; ☼ R Nair (रविशंकर नायर); dev; user; user
Subject: RE: No Reducer scenarios


In this case the output of the map-tasks directly go to distributed file-system, to the path
set by FileOutputFormat.setOutputPath(JobConf, Path)<https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#setOutputPath(org.apache.hadoop.mapred.JobConf,%20org.apache.hadoop.fs.Path)>.
Also, the framework doesn't sort the map-outputs before writing it out to HDFS.


From: 萝卜丝炒饭 [mailto:1427357147@qq.com]
Sent: Friday, February 03, 2017 12:05 PM
To: ☼ R Nair (रविशंकर नायर); dev; user; user
Subject: Re: No Reducer scenarios

HI  Nair,
have you know the class please? I tried to find but failed. I know NewDirectOutputCollector
is used to write tmp files.
---Original---
From: "☼ R Nair (रविशंकर नायर)"<ravishankar.nair@gmail.com<mailto:ravishankar.nair@gmail.com>>
Date: 2017/1/30 13:32:04
To: "dev"<dev@spark.apache.org<mailto:dev@spark.apache.org>>;"user"<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>;"user"<user@spark.apache.org<mailto:user@spark.apache.org>>;
Subject: No Reducer scenarios

Dear all,


1) When we don't set the reducer class in driver program, IdentityReducer is invoked.

2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked.

Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx
format) , which shows the map output. But we know that the output of Map is always written
to intermediate local file system. So who/which class is responsible for taking these intermediate
Map outputs from local file system and writes to HDFS ? Does this particular class performs
this write operation only when setNumReduceTasks is set to zero?

Best, Ravion
This message contains information that may be privileged or confidential and is the property
of the KPIT Technologies Ltd. It is intended only for the person to whom it is addressed.
If you are not the intended recipient, you are not authorized to read, print, retain copy,
disseminate, distribute, or use this message or any part thereof. If you receive this message
in error, please notify the sender immediately and delete all copies of this message. KPIT
Technologies Ltd. does not accept any liability for virus infected mails.
Mime
View raw message