hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artem Yankov <artem.yan...@gmail.com>
Subject Hadoop cluster on EC2: hangs on big chunks of data
Date Tue, 25 Oct 2011 17:56:15 GMT
Hey,

I set up a hadoop cluster on EC2 using this documentation:
http://wiki.apache.org/hadoop/AmazonEC2

OS: Linux Fedora 8
Hadoop version is 0.20.203.0
java version "1.7.0_01"
heap size: 1Gb (stats always shows that it uses only 4% of this)
I use mongo-hadoop plugin to get data from mongodb.

Everything seems to work perfect with the small chunks of data: calculations
are fast, I'm getting the results and tasks
seem to be distributed normally among the slaves.

Then I try to load a huge amount of data (22 Millions of records) and
everything hangs. First slave receives a map task and other slaves are not.
In logs I constantly see this:

*INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.processReport:
from x.x.x.x:50010, blocks: 2, processing time: 0 m*
*
*
I tried to use different number of slaves (maximum I ran 25 nodes), but it
doesn't help cause it seems that when first slave receives a job it blocks
everything else. (again, everything works cool with the small chunks of
data).

There are no significant CPU or Memory load on Master.

Any ideas on what can be a reason of this?

Artem.

Mime
View raw message