Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5C551D21C for ; Wed, 18 Jul 2012 10:00:43 +0000 (UTC) Received: (qmail 73100 invoked by uid 500); 18 Jul 2012 10:00:42 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 72941 invoked by uid 500); 18 Jul 2012 10:00:42 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 72915 invoked by uid 99); 18 Jul 2012 10:00:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jul 2012 10:00:41 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-ob0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jul 2012 10:00:34 +0000 Received: by obhx4 with SMTP id x4so2545178obh.14 for ; Wed, 18 Jul 2012 03:00:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=Eu01kAnQDSMvJBuzCKUotlSrqV0w+oxZDb/n9iUJmDU=; b=Hi8mF8BYF6swp1k5T0Mez8u2FEoWEfPMWYasjFT8g+aLtSb/HT/kG3ncQu8QrExrCs +dUVghi9GcnOCAewieGWltiS4V12hqkELB4iwcvwkMXPlOXKLPaBnC8QA6iCJBoX5KZb 3vFWjnw/zRYRb0CUK+mjBqSi3orvSOsMQjOCbgEEgEbuw07dFQw1rk3Cr88lG2PzrtRM dNDmrskNn7UlP7rgVX0YhdWnw+xvVM6NNpFK5kWYLRX1NcYhZyotzDMpbmUP1HoGy1eq Iaazt8ZOcgsBOwBRfKLRCL9HvW94uzkVm4+RE54G3PuGJRplZolnn6TzLoc0eHheN439 jYiA== MIME-Version: 1.0 Received: by 10.60.0.164 with SMTP id 4mr461387oef.4.1342605613537; Wed, 18 Jul 2012 03:00:13 -0700 (PDT) Sender: saint.ack@gmail.com Received: by 10.182.74.71 with HTTP; Wed, 18 Jul 2012 03:00:13 -0700 (PDT) In-Reply-To: References: <1342199513.82773.YahooMailNeo@web121701.mail.ne1.yahoo.com> Date: Wed, 18 Jul 2012 12:00:13 +0200 X-Google-Sender-Auth: 6I_nmVnSjM7QgmEEg6Msf4ziHcY Message-ID: Subject: Re: hbase mttr vs. hdfs From: Stack To: dev@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 The proposal seems good to me. Its minimally intrusive. See also below... On Mon, Jul 16, 2012 at 7:08 PM, N Keywal wrote: > And to continue on this, for the files still opened (i.e. our wal > files), we've got two calls to the dead DN: > > one, during the input stream opening, from DFSClient#updateBlockInfo. > This calls fails, but the exception is shallowed without being logged. > The node info is not updated, but there is no error, so we continue > without the right info. The timeout will be 60 seconds. This call is > one the port 50020. > the second, will be the one already mentioned for the data transfer, > with the timeout of 69 seconds. The dead nodes list is not updated by > the first failure, leading to a total wait time >2 minutes if we got > directed to the bad location. > Saving this extra second timeout is worth our doing a bit of work. The NN is like the federal government. It has general high-level policies and knows about 'conditions' from the macro level; network topologies, placement policies. The DFSInput/OutputStream is like local government. It reacts to the local conditions reordering the node list if it just timed out the node in position zero. Whats missing is state government, smarts in DFSClient, a means of being able to inform adjacent local governments about conditions that might effect their operation; dead of lagging DNs, etc. St.Ack