Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=message-id:date:from:user-agent:mime-version:to:subject:
	references:in-reply-to:content-type:content-transfer-encoding;
	b=Rq3FbuIyXiziGN0ZENMqYa3x1v/ZGPCeXiMVRBG40eoFTTk2tb7BNTW6tXXEdWPF
Message-ID: <49A45056.2010405@yahoo-inc.com>
Date: Tue, 24 Feb 2009 11:53:58 -0800
From: Raghu Angadi <rangadi@yahoo-inc.com>
User-Agent: Thunderbird 2.0.0.19 (Windows/20081209)
MIME-Version: 1.0
To: core-dev@hadoop.apache.org
Subject: Re: [jira] Commented: (HADOOP-4584) Slow generation of blockReport
 at 	DataNode causes delay of sending heartbeat to NameNode
References: <178281774.1225759904336.JavaMail.jira@brutus>
	 <135390706.1235496962513.JavaMail.jira@brutus>
 <314098690902241018wf5b7416qdde18fade89c9f3b@mail.gmail.com>
 <49A44BFF.7050909@yahoo-inc.com>
In-Reply-To: <49A44BFF.7050909@yahoo-inc.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Raghu Angadi wrote:
> jason hadoop wrote:
>> Any reason for not using an internal or external agent that receives
>> notification from the operating system about filesystem operations in the
>> block storage subtree?
> 
> lack of a patch to do so, may be?

Please let us know if there is a Java prototype implementation of this. 
I think NIO2 has some interface for this.. but not sure if there is some 
equivalent solution for JDK 1.6. Once this is available, it could be 
optionally enabled.

thanks
Raghu.


> Raghu.
> 
>>
>> On Tue, Feb 24, 2009 at 9:36 AM, Raghu Angadi (JIRA) 
>> <jira@apache.org>wrote:
>>
>>>    [
>>> https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676344#action_12676344] 
>>>
>>>
>>> Raghu Angadi commented on HADOOP-4584:
>>> --------------------------------------
>>>
>>> Ideally there is no requirement for block reports. It is essentially to
>>> used as 'catch all' for various bugs and errors. (of course, it is now
>>> overloaded with job of informing about deletions to NameNode, this 
>>> should
>>> removed).
>>>
>>> Yes, it specifically removes disk scan without fundamentally changing
>>> meaning of block reports. Now DN informs NameNode about the the block 
>>> that
>>> it thinks it had. because :
>>>
>>>  * 'rm -r' by admin is just one form of many many things that can go 
>>> wrong
>>> with blocks on datanode. There is no perticular reason we should have 
>>> this
>>> very costly disk scan (with a global lock held) just for this.
>>>    ** In fact 'rm -r' is probably the least probable error (haven't seen
>>> even once in practice).
>>>
>>>  * We have periodic block verification that does handle various 
>>> things that
>>> can go wrong with a block (it can improve further).
>>>      ** So 'rm -r' will be handled, just at the rate of rest of the 
>>> block
>>> problems.
>>>
>>>  * on the other hand many users have complained about datanode scans 
>>> taking
>>> 10s of minutes and making datanodes lose heartbeats.
>>>     ** This makes the system pretty unusable and a major obstruction for
>>> graceful degradation under load and for scalability.
>>>    ** One can argue that those users should not have so many blocks. 
>>> But I
>>> think DN should still handle it to the best of it abilities and not 
>>> die on
>>> them.
>>>    ** Disks might be slow for many other reasons (other tasks on the
>>> machine, etc).
>>>
>>>  * I think this is orthogonal to HADOOP-1079 since it addresses RPC and
>>> NameNode overhead of block reports. This jira is only about DataNode 
>>> side.
>>>
>>> Yes, this is a bigger change in semantics than what we proposed 
>>> earlier :
>>> to scan the directories slowly, without holding the global lock... but
>>> offline scan looks like a workaround for a problem that does not need 
>>> to be
>>> solved. Not scanning is much simpler than handling offline scan.
>>>
>>> Eventually we need to reduce the frequency of block reports.. this 
>>> can be
>>> done as soon as we add acks for block deletions. This JIRA is major 
>>> step in
>>> that direction.
>>>
>>>
>>>
>>>
>>>> Slow generation of blockReport at DataNode causes delay of sending
>>> heartbeat to NameNode
>>> ---------------------------------------------------------------------------------------- 
>>>
>>>>                 Key: HADOOP-4584
>>>>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>>>>             Project: Hadoop Core
>>>>          Issue Type: Bug
>>>>          Components: dfs
>>>>            Reporter: Hairong Kuang
>>>>            Assignee: Suresh Srinivas
>>>>             Fix For: 0.20.0
>>>>
>>>>         Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch,
>>> 4584.patch, 4584.patch
>>>>
>>>> sometimes due to disk or some other problems, datanode takes minutes or
>>> tens of minutes to generate a block report. It causes the datanode 
>>> not able
>>> to send heartbeat to NameNode every 3 seconds. In the worst case, it 
>>> makes
>>> NameNode to detect a lost heartbeat and wrongly decide that the 
>>> datanode is
>>> dead.
>>>> It would be nice to have two threads instead. One thread is for 
>>>> scanning
>>> data directories and generating block report, and executes the 
>>> requests sent
>>> by NameNode; Another thread is for sending heartbeats, block reports, 
>>> and
>>> picking up the requests from NameNode. By having these two threads, the
>>> sending of heartbeats will not get delayed by any slow block report 
>>> or slow
>>> execution of NameNode requests.
>>>
>>> -- 
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>
>