hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From newpant <newpant0...@gmail.com>
Subject Re: KeyValueTextInputFormat
Date Sat, 28 Aug 2010 04:00:21 GMT
Hi Mark, for example, i have a dataset which contains the weather data like
the example in Hadoop: The Definitive Guide, the file is ascii-encoded,
and we use TextInputFormat, so, the input key and value type  of mapper is
LongWritable(the line offset in bytes) and Text(data record),
mapper will parser the record and output the year and temperature, year is
a Text, temperature is a IntWritable. Reducer will take the mapper output,
find the maximum value for a given key.

in this case, the mapper input type is : <LongWritable, Text> and output
type is <Text, IntWritable>. Reducer's input type is <Text,
Iterable<IntWritable>>, and output a <Text, IntWritable>


2010/8/27 Mark <static.void.dev@gmail.com>

>   On 8/26/10 7:47 PM, newpant wrote:
>
>> Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to
>> set
>> the input format class ? Default input format class is TextInputFormat,
>> and
>> the Key type is LongWritable, which store offset of lines in the file (in
>> byte)
>>
>> if your reducer accept a different key or value from mapper output, you
>> need
>> to setMapOutputKeyClass and setMapOutputValueClass
>>
>> 2010/8/27 Mark<static.void.dev@gmail.com>
>>
>>  When I configure my job to use a KeyValueTextInputFormat doesn't that
>>> imply that the key and value to my mapper will be both Text?
>>>
>>> I have it set up like this and I am using the default Mapper.class ie
>>> IdentityMapper
>>> - KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
>>>
>>> but I keep receiving this error:
>>> - java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
>>> be
>>> cast to org.apache.hadoop.io.Text
>>>
>>> I would expect this error if I was using the FileInputFormat because that
>>> return the key as a LongWritable and the value as Text but I am unsure of
>>> why its happening here.
>>>
>>> Also on the same note, when I supply FileInputFormat or
>>> KeyValueTextInputFormat does that implicitly set job.setMapOutputKeyClass
>>> and job.setMapOutputValueClass. When are these used?
>>>
>>> Thanks for the clarification
>>>
>>>
>>>
>>>
>>>
>>> No I didnt set that and when I did everything worked as expected. I
> thought if I used:
>
>
> KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]))
>
>
> it would set that for me or at lest know that it would be text/text as
> input. Im guessing that is wrong.
>
>
> if your reducer accept a different key or value from mapper output, you
> need
> to setMapOutputKeyClass and setMapOutputValueClass
>
> When would this ever come up? Does it just cast to the appropriate classes
> then?
>
> Thanks
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message