hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: How hadoop parse input files into (Key,Value) pairs ??
Date Thu, 05 May 2011 11:39:41 GMT
Hadoop uses an InputFormat class to parse files and generate key,
value pairs for your Mapper. An InputFormat is any class which extends
the base abstract class:

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/InputFormat.html

The default InputFormat parse text files generating keys which are
byte offsets and values which are complete lines of text:

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/InputFormat.html

You can write your own InputFormat and configure your job to use it by
calling setInputFormat() on your Job before submitting it:

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/Job.html#setInputFormatClass(java.lang.Class)

Hope that helps.

-Joey

P.S. I moved this over to the mapreduce-user alias since it's
MapReduce specific.

On Thu, May 5, 2011 at 7:31 AM, praveenesh kumar <praveenesh@gmail.com> wrote:
> Hi,
>
> As we know hadoop mapper takes input as (Key,Value) pairs and generate
> intermediate (Key,Value) pairs and usually we give input to our Mapper as a
> text file.
> How hadoop understand this and parse our input text file into (Key,Value)
> Pairs
>
> Usually our mapper looks like  --
> *public* *void* map(LongWritable key, Text value,OutputCollector<Text, Text>
> outputCollector, Reporter reporter) *throws* IOException {
>
> String word = value.toString();
>
> //Some lines of code
>
> }
>
> So if I pass any text file as input, it is taking every line as VALUE to
> Mapper..on which I will do some processing and put it to OutputCollector.
> But how hadoop parsed my text file into ( Key,Value ) pair and how can we
> tell hadoop what (key,value) it should give to mapper ??
>
> Thanks.
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message