Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 88C0C4629 for ; Wed, 11 May 2011 13:21:52 +0000 (UTC) Received: (qmail 76508 invoked by uid 500); 11 May 2011 13:21:51 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 76425 invoked by uid 500); 11 May 2011 13:21:51 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 76417 invoked by uid 99); 11 May 2011 13:21:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2011 13:21:51 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 209.85.160.48 is neither permitted nor denied by domain of james@tynt.com) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2011 13:21:43 +0000 Received: by pwi16 with SMTP id 16so326500pwi.35 for ; Wed, 11 May 2011 06:21:22 -0700 (PDT) Received: by 10.68.2.129 with SMTP id 1mr2076029pbu.145.1305118449411; Wed, 11 May 2011 05:54:09 -0700 (PDT) Received: from [10.0.1.13] (S01060016cbc5f8b4.cg.shawcable.net [174.0.46.252]) by mx.google.com with ESMTPS id p2sm5507255pbq.6.2011.05.11.05.54.08 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 11 May 2011 05:54:08 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Stability issue - dead DN's From: James Seigel In-Reply-To: Date: Wed, 11 May 2011 06:54:06 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <763952E3-8AAD-4757-99DC-57ABC06BBC2C@tynt.com> References: To: general@hadoop.apache.org X-Mailer: Apple Mail (2.1084) Evert, What=92s the stack trace and what version of hadoop do you have = installed Sir! James. On 2011-05-11, at 3:23 AM, Evert Lammerts wrote: > Hi list, >=20 > I notice that whenever our Hadoop installation is put under a heavy = load we lose one or two (on a total of five) datanodes. This results in = IOExceptions, and affects the overall performance of the job being run. = Can anybody give me advise or best practices on a different = configuration to increase the stability? Below I've included the specs = of the cluster, the hadoop related config and an example of when which = things go wrong. Any help is very much appreciated, and if I can provide = any other info please let me know. >=20 > Cheers, > Evert >=20 > =3D=3D What goes wrong, and when =3D=3D >=20 > See attached a screenshot of Ganglia when the cluster is under load of = a single job. This job: > * reads ~1TB from HDFS > * writes ~200GB to HDFS > * runs 288 Mappers and 35 Reducers >=20 > When the job runs it takes all available Map and Reduce slots. The = system starts swapping and there is a short time interval during which = most cores are in WAIT. After that the job really starts running. At = around half way, one or two datanodes become unreachable and are marked = as dead nodes. The amount of under-replicated blocks becomes huge. Then = some "java.io.IOException: Could not obtain block" are thrown in = Mappers. The job does manage to finish successfully after around 3.5 = hours, but my fear is that when we make the input much larger - which we = want - the system becomes too unstable to finish the job. >=20 > Maybe worth mentioning - never know what might help diagnostics. We = notice that memory usage becomes less when we switch our keys from Text = to LongWritable. Also, the Mappers are done in a fraction of the time. = However, this for some reason results in much more network traffic and = makes Reducers extremely slow. We're working on figuring out what causes = this. >=20 >=20 > =3D=3D The cluster =3D=3D >=20 > We have a cluster that consists of 6 Sun Thumpers running Hadoop = 0.20.2 on CentOS 5.5. One of them acts as NN and JT, the other 5 run = DN's and TT's. Each node has: > * 16GB RAM > * 32GB swapspace > * 4 cores > * 11 LVM's of 4 x 500GB disks (2TB in total) for HDFS > * non-HDFS stuff on separate disks > * a 2x1GE bonded network interface for interconnects > * a 2x1GE bonded network interface for external access >=20 > I realize that this is not a well balanced system, but it's what we = had available for a prototype environment. We're working on putting = together a specification for a much larger production environment. >=20 >=20 > =3D=3D Hadoop config =3D=3D >=20 > Here some properties that I think might be relevant: >=20 > __CORE-SITE.XML__ >=20 > fs.inmemory.size.mb: 200 > mapreduce.task.io.sort.factor: 100 > mapreduce.task.io.sort.mb: 200 > # 1024*1024*4 MB, blocksize of the LVM's > io.file.buffer.size: 4194304 >=20 > __HDFS-SITE.XML__ >=20 > # 1024*1024*4*32 MB, 32 times the blocksize of the LVM's > dfs.block.size: 134217728 > # Only 5 DN's, but this shouldn't hurt > dfs.namenode.handler.count: 40 > # This got rid of the occasional "Could not obtain block"'s > dfs.datanode.max.xcievers: 4096 >=20 > __MAPRED-SITE.XML__ >=20 > mapred.tasktracker.map.tasks.maximum: 4 > mapred.tasktracker.reduce.tasks.maximum: 4 > mapred.child.java.opts: -Xmx2560m > mapreduce.reduce.shuffle.parallelcopies: 20 > mapreduce.map.java.opts: -Xmx512m > mapreduce.reduce.java.opts: -Xmx512m > # Compression codecs are configured and seem to work fine > mapred.compress.map.output: true > mapred.map.output.compression.codec: = com.hadoop.compression.lzo.LzoCodec >=20