hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: How hadoop parse input files into (Key,Value) pairs ??
Date Thu, 05 May 2011 11:39:41 GMT
Hadoop uses an InputFormat class to parse files and generate key,
value pairs for your Mapper. An InputFormat is any class which extends
the base abstract class:


The default InputFormat parse text files generating keys which are
byte offsets and values which are complete lines of text:


You can write your own InputFormat and configure your job to use it by
calling setInputFormat() on your Job before submitting it:


Hope that helps.


P.S. I moved this over to the mapreduce-user alias since it's
MapReduce specific.

On Thu, May 5, 2011 at 7:31 AM, praveenesh kumar <praveenesh@gmail.com> wrote:
> Hi,
> As we know hadoop mapper takes input as (Key,Value) pairs and generate
> intermediate (Key,Value) pairs and usually we give input to our Mapper as a
> text file.
> How hadoop understand this and parse our input text file into (Key,Value)
> Pairs
> Usually our mapper looks like  --
> *public* *void* map(LongWritable key, Text value,OutputCollector<Text, Text>
> outputCollector, Reporter reporter) *throws* IOException {
> String word = value.toString();
> //Some lines of code
> }
> So if I pass any text file as input, it is taking every line as VALUE to
> Mapper..on which I will do some processing and put it to OutputCollector.
> But how hadoop parsed my text file into ( Key,Value ) pair and how can we
> tell hadoop what (key,value) it should give to mapper ??
> Thanks.

Joseph Echeverria
Cloudera, Inc.

View raw message