Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 56234 invoked from network); 3 Feb 2009 13:16:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Feb 2009 13:16:22 -0000 Received: (qmail 56223 invoked by uid 500); 3 Feb 2009 13:16:15 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 56161 invoked by uid 500); 3 Feb 2009 13:16:14 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 56149 invoked by uid 99); 3 Feb 2009 13:16:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Feb 2009 05:16:14 -0800 X-ASF-Spam-Status: No, hits=-1.9 required=10.0 tests=DC_IMAGE_SPAM_HTML,DC_IMAGE_SPAM_TEXT,DC_PNG_UNO_LARGO,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of karl@conviva.com designates 216.82.254.35 as permitted sender) Received: from [216.82.254.35] (HELO mail143.messagelabs.com) (216.82.254.35) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 03 Feb 2009 13:16:05 +0000 X-VirusChecked: Checked X-Env-Sender: karl@conviva.com X-Msg-Ref: server-6.tower-143.messagelabs.com!1233666943!62432452!1 X-StarScan-Version: 6.0.0; banners=-,-,- X-Originating-IP: [216.38.138.37] Received: (qmail 22771 invoked from network); 3 Feb 2009 13:15:43 -0000 Received: from sam1mtai101.rinera.com (HELO mtai102.west.rinera.com) (216.38.138.37) by server-6.tower-143.messagelabs.com with SMTP; 3 Feb 2009 13:15:43 -0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mtai102.west.rinera.com (Postfix) with ESMTP id 67022670055; Tue, 3 Feb 2009 05:15:43 -0800 (PST) X-Virus-Scanned: amavisd-new at X-Spam-Score: -2.319 X-Spam-Level: Received: from mtai102.west.rinera.com ([127.0.0.1]) by localhost (mtai102.west.rinera.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y1xUf9am9W6p; Tue, 3 Feb 2009 05:15:43 -0800 (PST) Received: from [172.16.23.154] (nile.rinera.com [70.90.237.25]) by mtai102.west.rinera.com (Postfix) with ESMTP id 93346670053; Tue, 3 Feb 2009 05:15:42 -0800 (PST) Subject: Re: problem with completion notification from block movement From: Karl Kleinpaste To: core-user@hadoop.apache.org In-Reply-To: <314098690902022006j35f6e763nade324cb36429deb@mail.gmail.com> References: <1233334799.5164.80.camel@awol.kleinpaste.org> <314098690902011758u2ac19fbev8fd851757eb6bcff@mail.gmail.com> <1233599004.16154.130.camel@awol.kleinpaste.org> <314098690902022006j35f6e763nade324cb36429deb@mail.gmail.com> Content-Type: multipart/mixed; boundary="=-rfZwOzvAf80T36AdJ4j4" Date: Tue, 03 Feb 2009 08:15:41 -0500 Message-Id: <1233666941.16154.173.camel@awol.kleinpaste.org> Mime-Version: 1.0 X-Mailer: Evolution 2.24.3 (2.24.3-1.fc10) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=-2.319 tagged_above=-10 required=6.6 tests=[AWL=-0.280, BAYES_00=-2.599, DC_IMAGE_SPAM_HTML=0.001, DC_IMAGE_SPAM_TEXT=0.001, DC_PNG_UNO_LARGO=0.558] --=-rfZwOzvAf80T36AdJ4j4 Content-Type: text/plain Content-Transfer-Encoding: 7bit On Mon, 2009-02-02 at 20:06 -0800, jason hadoop wrote: > This can be made significantly worse by your underlying host file > system and the disks that support it. Oh, yes, we know... It was a late-realized mistake just yesterday that we weren't using noatime on that cluster's slaves. The attached graph is instructive. We have our nightly-rotated logs for DataNode all the way back to when this test cluster was created in November. This morning on one node, I sampled the first 10 BlockReport scan lines from each day's log, up through the current hour today, and handed it to gnuplot to graph. The seriously erratic behavior that begins around the 900K-1M point is very disturbing. Immediate solutions for us include noatime, nodiratime, BIOS upgrade on the discs, and eliminating enough small files (blocks) in DFS to get the total count below 400K. --=-rfZwOzvAf80T36AdJ4j4--