Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 44372 invoked from network); 24 Feb 2009 19:54:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Feb 2009 19:54:59 -0000 Received: (qmail 20053 invoked by uid 500); 24 Feb 2009 19:54:52 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 20042 invoked by uid 500); 24 Feb 2009 19:54:52 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 20031 invoked by uid 99); 24 Feb 2009 19:54:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2009 11:54:52 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.172] (HELO mrout2.yahoo.com) (216.145.54.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2009 19:54:43 +0000 Received: from [10.72.106.226] (heighthigh-lx.corp.yahoo.com [10.72.106.226]) by mrout2.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id n1OJrwbQ096797 for ; Tue, 24 Feb 2009 11:53:58 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:subject: references:in-reply-to:content-type:content-transfer-encoding; b=Rq3FbuIyXiziGN0ZENMqYa3x1v/ZGPCeXiMVRBG40eoFTTk2tb7BNTW6tXXEdWPF Message-ID: <49A45056.2010405@yahoo-inc.com> Date: Tue, 24 Feb 2009 11:53:58 -0800 From: Raghu Angadi User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: core-dev@hadoop.apache.org Subject: Re: [jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode References: <178281774.1225759904336.JavaMail.jira@brutus> <135390706.1235496962513.JavaMail.jira@brutus> <314098690902241018wf5b7416qdde18fade89c9f3b@mail.gmail.com> <49A44BFF.7050909@yahoo-inc.com> In-Reply-To: <49A44BFF.7050909@yahoo-inc.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Raghu Angadi wrote: > jason hadoop wrote: >> Any reason for not using an internal or external agent that receives >> notification from the operating system about filesystem operations in the >> block storage subtree? > > lack of a patch to do so, may be? Please let us know if there is a Java prototype implementation of this. I think NIO2 has some interface for this.. but not sure if there is some equivalent solution for JDK 1.6. Once this is available, it could be optionally enabled. thanks Raghu. > Raghu. > >> >> On Tue, Feb 24, 2009 at 9:36 AM, Raghu Angadi (JIRA) >> wrote: >> >>> [ >>> https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676344#action_12676344] >>> >>> >>> Raghu Angadi commented on HADOOP-4584: >>> -------------------------------------- >>> >>> Ideally there is no requirement for block reports. It is essentially to >>> used as 'catch all' for various bugs and errors. (of course, it is now >>> overloaded with job of informing about deletions to NameNode, this >>> should >>> removed). >>> >>> Yes, it specifically removes disk scan without fundamentally changing >>> meaning of block reports. Now DN informs NameNode about the the block >>> that >>> it thinks it had. because : >>> >>> * 'rm -r' by admin is just one form of many many things that can go >>> wrong >>> with blocks on datanode. There is no perticular reason we should have >>> this >>> very costly disk scan (with a global lock held) just for this. >>> ** In fact 'rm -r' is probably the least probable error (haven't seen >>> even once in practice). >>> >>> * We have periodic block verification that does handle various >>> things that >>> can go wrong with a block (it can improve further). >>> ** So 'rm -r' will be handled, just at the rate of rest of the >>> block >>> problems. >>> >>> * on the other hand many users have complained about datanode scans >>> taking >>> 10s of minutes and making datanodes lose heartbeats. >>> ** This makes the system pretty unusable and a major obstruction for >>> graceful degradation under load and for scalability. >>> ** One can argue that those users should not have so many blocks. >>> But I >>> think DN should still handle it to the best of it abilities and not >>> die on >>> them. >>> ** Disks might be slow for many other reasons (other tasks on the >>> machine, etc). >>> >>> * I think this is orthogonal to HADOOP-1079 since it addresses RPC and >>> NameNode overhead of block reports. This jira is only about DataNode >>> side. >>> >>> Yes, this is a bigger change in semantics than what we proposed >>> earlier : >>> to scan the directories slowly, without holding the global lock... but >>> offline scan looks like a workaround for a problem that does not need >>> to be >>> solved. Not scanning is much simpler than handling offline scan. >>> >>> Eventually we need to reduce the frequency of block reports.. this >>> can be >>> done as soon as we add acks for block deletions. This JIRA is major >>> step in >>> that direction. >>> >>> >>> >>> >>>> Slow generation of blockReport at DataNode causes delay of sending >>> heartbeat to NameNode >>> ---------------------------------------------------------------------------------------- >>> >>>> Key: HADOOP-4584 >>>> URL: https://issues.apache.org/jira/browse/HADOOP-4584 >>>> Project: Hadoop Core >>>> Issue Type: Bug >>>> Components: dfs >>>> Reporter: Hairong Kuang >>>> Assignee: Suresh Srinivas >>>> Fix For: 0.20.0 >>>> >>>> Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, >>> 4584.patch, 4584.patch >>>> >>>> sometimes due to disk or some other problems, datanode takes minutes or >>> tens of minutes to generate a block report. It causes the datanode >>> not able >>> to send heartbeat to NameNode every 3 seconds. In the worst case, it >>> makes >>> NameNode to detect a lost heartbeat and wrongly decide that the >>> datanode is >>> dead. >>>> It would be nice to have two threads instead. One thread is for >>>> scanning >>> data directories and generating block report, and executes the >>> requests sent >>> by NameNode; Another thread is for sending heartbeats, block reports, >>> and >>> picking up the requests from NameNode. By having these two threads, the >>> sending of heartbeats will not get delayed by any slow block report >>> or slow >>> execution of NameNode requests. >>> >>> -- >>> This message is automatically generated by JIRA. >>> - >>> You can reply to this email to add a comment to the issue online. >>> >>> >> >