hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kandoi, Nikhil" <Nikhil.Kan...@emc.com>
Subject Estimating the time of my hadoop jobs
Date Tue, 17 Dec 2013 10:39:30 GMT
Hello everyone,

I am new to Hadoop and would like to see if I'm on the right track.
Currently I'm developing an application which would ingest logs of order of 60-70 GB of data/day
and would then do
Some analysis on them
Now the infrastructure that I have is a 4 node cluster( all nodes on Virtual Machines) , all
nodes have 4GB ram.

But when I try to run the dataset (which is a sample dataset at this point ) of about 30 GB,
it takes about 3 hrs to process all of it.

I would like to know is it normal for this kind of infrastructure to take this amount of time.

Thank you

Nikhil Kandoi/

View raw message