hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akira AJISAKA <ajisa...@oss.nttdata.co.jp>
Subject Re: question about reduce method
Date Mon, 17 Feb 2014 18:14:02 GMT
Moving to user@hadoop.apache.org.

If you have a question about this, please reply to
user mailing list instead of mapreduce-dev@.

Thanks,
Akira

(2014/02/17 10:06), Akira AJISAKA wrote:
>> I know map method put these text file into map,like follows,right?
>> <001, 35.99>
>> <001, 35.99>
>> <002, 12.49>
>> <004, 13.42>
>> <003, 499.99>
>> <001 ,78.95>
>> <002, 21.99>
>> <002, 93.45>
>> <001, 9.99>
>> <001, John Allen>
>> <002, Abigail Smith>
>> <003, April Stevens>
>> <004, Nasser Hafez>
> 
> Followings outputs are the correct.
> 
> <001,sales	35.99>
> <002,sales	12.49>
> <004,sales	13.42>
> <003,sales	499.99>
> <001,sales	78.95>
> <002,sales	21.99>
> <002,sales	93.45>
> <001,sales	9.99>
> <001,accounts	John Allen>
> <002,accounts	Abigail Smith>
> <003,accounts	April Stevens>
> <004,accounts	Nasser Hafez>
> 
> The outputs are grouped and sorted by keys, and reducers process each
> groups. The inputs of the reduce method are as follows:
> 
> <key: 001,
>   values: {sales 35.99, sales 78.95, sales 9.99, accounts John Allen}>
> <key: 002,
>   values: {sales 12.49, sales 21.99, sales 93.45, accounts Abigail Smith}>
> <key: 003,
>   values: {sales 499.99, accounts April Stevens}>
> <key: 004,
>   values: {sales 13.42, accounts Nasser Hafez}>
> 
> Regards,
> Akira
> 
> (2014/02/17 1:14), EdwardKing wrote:
>> Hello every,
>>      I am a newbie to hadoop2.2.0, I puzzle with reduce method ,I have two text file,sales.txt
and account.txt,like follows:
>> sales.txt
>> 001 35.99 2012-03-15
>> 002 12.49 2004-07-02
>> 004 13.42 2005-12-20
>> 003 499.99 2010-12-20
>> 001 78.95 2012-04-02
>> 002 21.99 2006-11-30
>> 002 93.45 2008-09-10
>> 001 9.99 2012-05-17
>>
>> account.txt
>> 001 John Allen Standard 2012-03-15
>> 002 Abigail Smith Premium 2004-07-13
>> 003 April Stevens Standard 2010-12-20
>> 004 Nasser Hafez Premium 2001-04-23
>>
>> ReduceJoin.java is follows:
>> import java.io.* ;
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapreduce.Job;
>> import org.apache.hadoop.mapreduce.Mapper;
>> import org.apache.hadoop.mapreduce.Reducer;
>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>> import org.apache.hadoop.mapreduce.lib.input.MultipleInputs ;
>> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat ;
>>
>> public class ReduceJoin
>> {
>>       
>>       public static class SalesRecordMapper
>>       extends Mapper<Object, Text, Text, Text>{
>>           
>>           public void map(Object key, Text value, Context context
>>           ) throws IOException, InterruptedException
>>           {
>>               String record = value.toString() ;
>>               String[] parts = record.split("\t") ;
>>               
>>               context.write(new Text(parts[0]), new Text("sales\t"+parts[1])) ;
>>           }
>>       }
>>       
>>       public static class AccountRecordMapper
>>       extends Mapper<Object, Text, Text, Text>{
>>           
>>           public void map(Object key, Text value, Context context
>>           ) throws IOException, InterruptedException
>>           {
>>               String record = value.toString() ;
>>               String[] parts = record.split("\t") ;
>>               
>>               context.write(new Text(parts[0]), new Text("accounts\t"+parts[1]))
;
>>           }
>>       }
>>       
>>       public static class ReduceJoinReducer
>>       extends Reducer<Text, Text, Text, Text>
>>       {
>>           
>>           public void reduce(Text key, Iterable<Text> values,
>>               Context context
>>               ) throws IOException, InterruptedException
>>               {
>>                   String name = "" ;
>>               double total = 0.0 ;
>>               int count = 0 ;
>>               
>>               for(Text t: values)
>>               {
>>                   String parts[] = t.toString().split("\t") ;
>>                   
>>                   if (parts[0].equals("sales"))
>>                   {
>>                       count++ ;
>>                       total+= Float.parseFloat(parts[1]) ;
>>                   }
>>                   else if (parts[0].equals("accounts"))
>>                   {
>>                       name = parts[1] ;
>>                   }
>>               }
>>               
>>               String str = String.format("%d\t%f", count, total) ;
>>               context.write(new Text(name), new Text(str)) ;
>>           }
>>       }
>>       
>>       public static void main(String[] args) throws Exception {
>>           Configuration conf = new Configuration();
>>           Job job = new Job(conf, "Reduce-side join");
>>           job.setJarByClass(ReduceJoin.class);
>>           job.setReducerClass(ReduceJoinReducer.class);
>>           job.setOutputKeyClass(Text.class);
>>           job.setOutputValueClass(Text.class);
>>           MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class,
SalesRecordMapper.class) ;
>>           MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class,
AccountRecordMapper.class) ;
>>           //        FileOutputFormat.setOutputPath(job, new Path(args[2]));
>>           Path outputPath = new Path(args[2]);
>>           FileOutputFormat.setOutputPath(job, outputPath);
>>           outputPath.getFileSystem(conf).delete(outputPath);
>>           
>>           System.exit(job.waitForCompletion(true) ? 0 : 1);
>>       }
>> }
>>
>> I create join.jar and run it
>> $ hadoop jar join.jarReduceJoin sales accounts outputs
>> $ hadoop fs -cat /user/garry/outputs/part-r-00000
>> John Allen 3 124.929998
>> Abigail Smith 3 127.929996
>> April Stevens 1 499.989990
>> Nasser Hafez 1 13.420000
>>
>> I know map method put these text file into map,like follows,right?
>> <001, 35.99>
>> <001, 35.99>
>> <002, 12.49>
>> <004, 13.42>
>> <003, 499.99>
>> <001 ,78.95>
>> <002, 21.99>
>> <002, 93.45>
>> <001, 9.99>
>> <001, John Allen>
>> <002, Abigail Smith>
>> <003, April Stevens>
>> <004, Nasser Hafez>
>>
>> But I don't under stand reduce method,how it produce following result,any one counld
give the detail steps to produce following result?  Thanks in advance
>> John Allen 3 124.929998
>> Abigail Smith 3 127.929996
>> April Stevens 1 499.989990
>> Nasser Hafez 1 13.420000
>>
>>
>>
>> ---------------------------------------------------------------------------------------------------
>> Confidentiality Notice: The information contained in this e-mail and any accompanying
attachment(s)
>> is intended only for the use of the intended recipient and may be confidential and/or
privileged of
>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this
communication is
>> not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure
or copying
>> is strictly prohibited, and may be unlawful.If you have received this communication
in error,please
>> immediately notify the sender by return e-mail, and delete the original message and
all copies from
>> your system. Thank you.
>> ---------------------------------------------------------------------------------------------------
>>
> 


Mime
View raw message