hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayakumar B <vinayakuma...@huawei.com>
Subject RE: how to specify key and value for an input to mapreduce job
Date Tue, 14 Feb 2012 15:28:32 GMT
Hi Vamshi,

1. To read the input which have both key and value in text format you can use
KeyValueTextInputFormat inside org.apache.hadoop.mapreduce.lib.input package as InputFormat
class to your Job. This Input format will have KeyValueLineRecordReader which will read the
line and separate the key and value present in the same line.
Here you need to set the keyValue separator using following configuration in the job configuration.
"mapreduce.input.keyvaluelinerecordreader.key.value.separator"
Be default this will be '\t'.

2. Reduce output will be default TextOutputFormat with LongWritable key and Text value.
In Your case u need to have Text as both Key and Value.
Since you were using default TextInputFormat, u were getting complete line as the Value and
the position as the key. Now if you use KeyValueTextInputFormat you will get the desired result.

Thanks and Regards,
Vinayakumar B
______________________
________________________________
From: Vamshi Krishna [vamshi2105@gmail.com]
Sent: Tuesday, February 14, 2012 8:28 PM
To: mapreduce-user@hadoop.apache.org
Subject: how to specify key and value for an input to mapreduce job

Hi all,
i have a job which read all the rows from a hbase table and had written them to a location
in dfs i.e  /user/HSOP. HSOP is a folder which has 9 files each having their content as
00015DEGgJ    -HM
00016Pc4Tl    -HM
0001H0iImI    -HM
0001Oyb0Ju    -HM
0001hwBEOr    -HM
0002Qx2Uj9    -HM
0002jCs6gr    -HM
0003PMcWRa    -HM
000488xKIE    -HM

Both 1st and second columns are of Text type as specified in the first job's outputformat
class.

Now i want onemore job to read all these files as input and and treat first column  element
as "key" and second column  element as "value". For that i tried starting one job by specifying
 line job.getConfiguration().set("key.value.separator.in.input.line", "-");

In the reduce() method i had context.write(key, value);  key is Longwritable and value is
Text. But if i see the output of this job, i had seen the format like,

46    0002mCjpo9    -HM
253    000AxT9LSA    -HM
460    000FYtnxiB    -HM
667    000WNVBo9N    -HM
874    000dQiseKz    -HM

But i don't want first column to be added to each row. Please how to do that,
somebody help.


Mime
View raw message