hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jagaran das <jagaran_...@yahoo.co.in>
Subject Re: No. of Map and reduce tasks
Date Fri, 27 May 2011 02:21:15 GMT
If you give really low size files, then the use of "Big Block Size" of Hadoop 
goes away.
Instead try merging files.

Hope that helps



________________________________
From: James Seigel <james@tynt.com>
To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
Sent: Thu, 26 May, 2011 6:04:07 PM
Subject: Re: No. of Map and reduce tasks

Set input split size really low,  you might get something.

I'd rather you fire up some nix commands and pack together that file
onto itself a bunch if times and the put it back into hdfs and let 'er
rip

Sent from my mobile. Please excuse the typos.

On 2011-05-26, at 4:56 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:

> I think I understand that by last 2 replies :)  But my question is can
> I change this configuration to say split file into 250K so that
> multiple mappers can be invoked?
>
> On Thu, May 26, 2011 at 3:41 PM, James Seigel <james@tynt.com> wrote:
>> have more data for it to process :)
>>
>>
>> On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:
>>
>>> I ran a simple pig script on this file:
>>>
>>> -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log
>>>
>>> that orders the contents by name. But it only created one mapper. How
>>> can I change this to distribute accross multiple machines?
>>>
>>> On Thu, May 26, 2011 at 3:08 PM, jagaran das <jagaran_das@yahoo.co.in>

wrote:
>>>> Hi Mohit,
>>>>
>>>> No of Maps - It depends on what is the Total File Size / Block Size
>>>> No of Reducers - You can specify.
>>>>
>>>> Regards,
>>>> Jagaran
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Mohit Anchlia <mohitanchlia@gmail.com>
>>>> To: common-user@hadoop.apache.org
>>>> Sent: Thu, 26 May, 2011 2:48:20 PM
>>>> Subject: No. of Map and reduce tasks
>>>>
>>>> How can I tell how the map and reduce tasks were spread accross the
>>>> cluster? I looked at the jobtracker web page but can't find that info.
>>>>
>>>> Also, can I specify how many map or reduce tasks I want to be launched?
>>>>
>>>> From what I understand is that it's based on the number of input files
>>>> passed to hadoop. So if I have 4 files there will be 4 Map taks that
>>>> will be launced and reducer is dependent on the hashpartitioner.
>>>>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message