Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 85743 invoked from network); 24 Apr 2009 18:18:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Apr 2009 18:18:58 -0000 Received: (qmail 64651 invoked by uid 500); 24 Apr 2009 18:18:55 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 64542 invoked by uid 500); 24 Apr 2009 18:18:55 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 64531 invoked by uid 99); 24 Apr 2009 18:18:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Apr 2009 18:18:55 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mlimotte@feeva.com designates 64.78.22.17 as permitted sender) Received: from [64.78.22.17] (HELO EXHUB017-2.exch017.msoutlookonline.net) (64.78.22.17) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Apr 2009 18:18:45 +0000 Received: from EXVMBX017-2.exch017.msoutlookonline.net ([64.78.22.48]) by EXHUB017-2.exch017.msoutlookonline.net ([64.78.22.17]) with mapi; Fri, 24 Apr 2009 11:18:23 -0700 From: Marc Limotte To: "core-user@hadoop.apache.org" Date: Fri, 24 Apr 2009 11:18:22 -0700 Subject: RE: Advice on restarting HDFS in a cron Thread-Topic: Advice on restarting HDFS in a cron Thread-Index: AcnE+g1kszXv4hh1TwajaRUQLUm+FAABnCFuAAH3d1A= Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Actually, I'm concerned about performance of map/reduce jobs for a long-run= ning cluster. I.e. it seems to get slower the longer it's running. After = a restart of HDFS, the jobs seems to run faster. Not concerned about the s= tart-up time of HDFS. Of course, as you suggest, this could be poor configuration of the cluster = on my part; but I'd still like to hear best practices around doing a schedu= led restart. Marc -----Original Message----- From: Allen Wittenauer [mailto:aw@yahoo-inc.com] Sent: Friday, April 24, 2009 10:17 AM To: core-user@hadoop.apache.org Subject: Re: Advice on restarting HDFS in a cron On 4/24/09 9:31 AM, "Marc Limotte" wrote: > I've heard that HDFS starts to slow down after it's been running for a lo= ng > time. And I believe I've experienced this. We did an upgrade (=3D=3D complete restart) of a 2000 node instance in ~20 minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV. I suspect people aren't running the secondary name node and therefore have massively large edits file. The name node appears slow on restart because it has to apply the edits to the fsimage rather than having the secondary keep it up to date. -----Original Message----- From: Marc Limotte Hi. I've heard that HDFS starts to slow down after it's been running for a long= time. And I believe I've experienced this. So, I was thinking to set up= a cron job to execute every week to shutdown HDFS and start it up again. In concept, it would be something like: 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh But I'm wondering if there is a safer way to do this. In particular: * What if a map/reduce job is running when this cron hits. Is ther= e a way to suspend jobs while the HDFS restart happens? * Should I also restart the mapred daemons? * Should I wait some time after "stop-dfs.sh" for things to settle = down, before executing "start-dfs.sh"? Or maybe I should run a command to = verify that it is stopped before I run the start? Thanks for any help. Marc PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ON= LY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION P= RIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DIS= SEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. = PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELET= E THIS MESSAGE FROM YOUR SYSTEM.