hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagat Singh <jagatsi...@gmail.com>
Subject Re: Understanding of the hadoop distribution system (tuning)
Date Tue, 11 Sep 2012 02:17:31 GMT
Hello Elaine,

You did not tell your cluster size. Number of nodes , cores in each node.

What sort of work you are doing , 6 hours for 518MB data is huge time.

The number of map tasks would be 518/64

So this many map tasks needs to run to process your data.

Now they can run on single node or multiple nodes depending on available
slots. Did you check job tracker page while execution is taking place ,
there you can see at which node its being processed. You can go to Running
tasks page.


Jagat Singh

On Tue, Sep 11, 2012 at 11:56 AM, Elaine Gan <elaine-gan@gmo.jp> wrote:

> Hi,
> I'm new to hadoop and i've just played around with map reduce.
> I would like to check if my understanding to hadoop is correct and i
> would appreciate if anyone could correct me if i'm wrong.
> I have a data of around 518MB, and i wrote a MR program to process it.
> Here are some of my settings in my mapred-site.xml.
> ---------------------------------------------------------------
> mapred.tasktracker.map.tasks.maximum = 20
> mapred.tasktracker.reduce.tasks.maximum = 20
> ---------------------------------------------------------------
> My block size is default, 64MB
> With my data size = 518MB, i guess setting the maximum for MR task to 20
> is far more than enough (518/64 = 8) , did i get it correctly?
> When i run the MR program, i could see in the Map/Reduce Administration
> page that the number of Maps Total = 8, so i assume that everything is
> going well here, once again if i'm wrong please correct me.
> (Sometimes it shows only Maps Total = 3)
> There's one thing which i'm uncertain about hadoop distribution.
> Is the Maps Total = 8 means that there are 8 map tasks split among all
> the data nodes (task trackers)?
> Is there anyway i can checked whether all the tasks are shared among
> datanodes (where task trackers are working).
> When i clicked on each link under that Task Id, i can see there's "Input
> Split Locations" stated under each task details, if the inputs are
> splitted between data nodes, does that means that everything is working
> well?
> I need to make sure i got everything running well because my MR took
> around 6 hours to finish despite the input size is small.. (Well, i know
> hadoop is not meant for small data), I'm not sure whether it's my
> configuration that goes wrong or hadoop is just not suitable for my case.
> I'm actually running a mahout kmeans analysis.
> Thank you for your time.

View raw message