hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artem Yankov <artem.yan...@gmail.com>
Subject Hadoop cluster on EC2: hangs on big chunks of data
Date Tue, 25 Oct 2011 17:56:15 GMT

I set up a hadoop cluster on EC2 using this documentation:

OS: Linux Fedora 8
Hadoop version is
java version "1.7.0_01"
heap size: 1Gb (stats always shows that it uses only 4% of this)
I use mongo-hadoop plugin to get data from mongodb.

Everything seems to work perfect with the small chunks of data: calculations
are fast, I'm getting the results and tasks
seem to be distributed normally among the slaves.

Then I try to load a huge amount of data (22 Millions of records) and
everything hangs. First slave receives a map task and other slaves are not.
In logs I constantly see this:

*INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.processReport:
from x.x.x.x:50010, blocks: 2, processing time: 0 m*
I tried to use different number of slaves (maximum I ran 25 nodes), but it
doesn't help cause it seems that when first slave receives a job it blocks
everything else. (again, everything works cool with the small chunks of

There are no significant CPU or Memory load on Master.

Any ideas on what can be a reason of this?


View raw message