hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Hadoop Runner
Date Sun, 12 Jun 2011 04:42:17 GMT

I may not have gotten your question exactly, but you can do further
processing inside of your FileInputFormat derivative's RecordReader
implementation (just before it loads the value for a next() form of
call -- which the MapRunner would use to read).

If you're looking to dig into Hadoop's source code to understand the
flow yourself, MapTask.java is what you may be looking for (run*

On Sun, Jun 12, 2011 at 3:25 AM, Mark question <markq2011@gmail.com> wrote:
> Hi,
>  1) Where can I find the "main" class of hadoop? The one that calls the
> InputFormat then the MapperRunner and ReducerRunner and others?
>    This will help me understand what is in memory or still on disk , exact
> flow of data between split and mappers .
> My problem is, assuming I have a TextInputFormat and would like to modify
> the input in memory before being read by RecordReader... where shall I do
> that?
>    InputFormat was my first guess, but unfortunately, it only defines the
> logical splits ... So, the only way I can think of is use the recordReader
> to read all the records in split into another variable (with the format I
> want) then process that variable by map functions.
>   But is that efficient? So, to understand this,I hope someone can give an
> answer to Q(1)
> Thank you,
> Mark

Harsh J

View raw message