Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: 209.85.160.48 is neither permitted
 nor denied by domain of james@tynt.com)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Apple Message framework v1084)
Subject: Re: Stability issue - dead DN's
From: James Seigel <james@tynt.com>
In-Reply-To: <ADF94D8555C7A246B86A633685E0178AB90A474E7A@planck.ka.sara.nl>
Date: Wed, 11 May 2011 06:54:06 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <763952E3-8AAD-4757-99DC-57ABC06BBC2C@tynt.com>
References: <ADF94D8555C7A246B86A633685E0178AB90A474E7A@planck.ka.sara.nl>
To: general@hadoop.apache.org

Evert,

What=92s the stack trace and what version of hadoop do you have =
installed Sir!

James.
On 2011-05-11, at 3:23 AM, Evert Lammerts wrote:

> Hi list,
>=20
> I notice that whenever our Hadoop installation is put under a heavy =
load we lose one or two (on a total of five) datanodes. This results in =
IOExceptions, and affects the overall performance of the job being run. =
Can anybody give me advise or best practices on a different =
configuration to increase the stability? Below I've included the specs =
of the cluster, the hadoop related config and an example of when which =
things go wrong. Any help is very much appreciated, and if I can provide =
any other info please let me know.
>=20
> Cheers,
> Evert
>=20
> =3D=3D What goes wrong, and when =3D=3D
>=20
> See attached a screenshot of Ganglia when the cluster is under load of =
a single job. This job:
> * reads ~1TB from HDFS
> * writes ~200GB to HDFS
> * runs 288 Mappers and 35 Reducers
>=20
> When the job runs it takes all available Map and Reduce slots. The =
system starts swapping and there is a short time interval during which =
most cores are in WAIT. After that the job really starts running. At =
around half way, one or two datanodes become unreachable and are marked =
as dead nodes. The amount of under-replicated blocks becomes huge. Then =
some "java.io.IOException: Could not obtain block" are thrown in =
Mappers. The job does manage to finish successfully after around 3.5 =
hours, but my fear is that when we make the input much larger - which we =
want - the system becomes too unstable to finish the job.
>=20
> Maybe worth mentioning - never know what might help diagnostics.  We =
notice that memory usage becomes less when we switch our keys from Text =
to LongWritable. Also, the Mappers are done in a fraction of the time. =
However, this for some reason results in much more network traffic and =
makes Reducers extremely slow. We're working on figuring out what causes =
this.
>=20
>=20
> =3D=3D The cluster =3D=3D
>=20
> We have a cluster that consists of 6 Sun Thumpers running Hadoop =
0.20.2 on CentOS 5.5. One of them acts as NN and JT, the other 5 run =
DN's and TT's. Each node has:
> * 16GB RAM
> * 32GB swapspace
> * 4 cores
> * 11 LVM's of 4 x 500GB disks (2TB in total) for HDFS
> * non-HDFS stuff on separate disks
> * a 2x1GE bonded network interface for interconnects
> * a 2x1GE bonded network interface for external access
>=20
> I realize that this is not a well balanced system, but it's what we =
had available for a prototype environment. We're working on putting =
together a specification for a much larger production environment.
>=20
>=20
> =3D=3D Hadoop config =3D=3D
>=20
> Here some properties that I think might be relevant:
>=20
> __CORE-SITE.XML__
>=20
> fs.inmemory.size.mb: 200
> mapreduce.task.io.sort.factor: 100
> mapreduce.task.io.sort.mb: 200
> # 1024*1024*4 MB, blocksize of the LVM's
> io.file.buffer.size: 4194304
>=20
> __HDFS-SITE.XML__
>=20
> # 1024*1024*4*32 MB, 32 times the blocksize of the LVM's
> dfs.block.size: 134217728
> # Only 5 DN's, but this shouldn't hurt
> dfs.namenode.handler.count: 40
> # This got rid of the occasional "Could not obtain block"'s
> dfs.datanode.max.xcievers: 4096
>=20
> __MAPRED-SITE.XML__
>=20
> mapred.tasktracker.map.tasks.maximum: 4
> mapred.tasktracker.reduce.tasks.maximum: 4
> mapred.child.java.opts: -Xmx2560m
> mapreduce.reduce.shuffle.parallelcopies: 20
> mapreduce.map.java.opts: -Xmx512m
> mapreduce.reduce.java.opts: -Xmx512m
> # Compression codecs are configured and seem to work fine
> mapred.compress.map.output: true
> mapred.map.output.compression.codec: =
com.hadoop.compression.lzo.LzoCodec
>=20