hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kandoi, Nikhil" <Nikhil.Kan...@emc.com>
Subject RE: Estimating the time of my hadoop jobs
Date Tue, 17 Dec 2013 11:26:13 GMT
I know this foolish of me to ask this, because there are a lot of factors that affect this,
but why is it taking so much time, can anyone suggest possible reasons for it, or if anyone
has faced such issue before

Nikhil Kandoi
P.S - I am  Hadoop-1.0.3  for this application, so I wonder if this version has got something
to do with it.

From: Azuryy Yu [mailto:azuryyyu@gmail.com]
Sent: Tuesday, December 17, 2013 4:14 PM
To: user@hadoop.apache.org
Subject: Re: Estimating the time of my hadoop jobs

Hi Kandoi,
It depends on:
how many cores on each VNode
how complicated of your analysis application

But I don't think it's normal spent 3hr to process 30GB data even on your *not good* hareware.

On Tue, Dec 17, 2013 at 6:39 PM, Kandoi, Nikhil <Nikhil.Kandoi@emc.com<mailto:Nikhil.Kandoi@emc.com>>
Hello everyone,

I am new to Hadoop and would like to see if I'm on the right track.
Currently I'm developing an application which would ingest logs of order of 60-70 GB of data/day
and would then do
Some analysis on them
Now the infrastructure that I have is a 4 node cluster( all nodes on Virtual Machines) , all
nodes have 4GB ram.

But when I try to run the dataset (which is a sample dataset at this point ) of about 30 GB,
it takes about 3 hrs to process all of it.

I would like to know is it normal for this kind of infrastructure to take this amount of time.

Thank you

Nikhil Kandoi/

View raw message