hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Who actually does the split computation?
Date Wed, 09 Feb 2011 21:52:24 GMT
On Wed, Feb 9, 2011 at 1:49 PM, Sean Bigdatafun
<sean.bigdatafun@gmail.com>wrote:

> Where does this computation happen (in the context of the original picture
> in the posted link )?
>
> JobClient? or JobTracker? (Either way I think they need to contact HDFS
> Namenode to do such a work, which did not seem to get described in that
> link) --- I can't post on mapreduce-user mailing list, so I have to ask it
> here.
>
>
Happens in the JobClient. See o.a.h.mapreduce.JobSubmitter.java:357 in
trunk.

The inputformat's getSplits() method will call out to the NN to find the
locations for the inputs files. See the implementation of FileInputFormat
for details.

-Todd


> On Wed, Feb 9, 2011 at 1:13 PM, David Rosenstrauch <darose@darose.net>wrote:
>
>> On 02/09/2011 04:09 PM, Sean Bigdatafun wrote:
>>
>>> 1. My first question: who is responsible to compute the input splits?
>>>
>>
>> The InputFormat computes InputSplits.  See:
>> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/InputFormat.html
>>
>> DR
>>
>
>
>
> --
> --Sean
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message