hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
Date Mon, 02 Feb 2009 19:49:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669719#action_12669719
] 

Raghu Angadi commented on HADOOP-4584:
--------------------------------------

I think this approach might work ok for now. It makes sure the data node is not marked dead.
But this should  be considered mostly a work around. We should note the fundamental problem
still remains (a little less lethal). e.g. a) new blocks are not reported, b) no new blocks
can be written during this time c) (not sure) not blocks can be read? etc. 

If all the nodes are taking very long to process the block report, many operations on HDFS
will fail. An admin can increase the block report period to reduce the effect of this problem.
The current fix works fine for occasional delays.

> In step 4. should we wait for receiving a command or for receiving another block?

both would be better.

>  In OfferService we process all the commands that are in the queue at once. Do you see
any issues with it?
Not fundamentally different. One main issue would be that there might be thousands of blocks
to delete sometimes.. But that is same problem as long block report.

Regd more complete fix, I could file another jira to propose a fix that I discussed with Sameer
and Hairong, that satisfies all the requirements on current block report.



> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes
to generate a block report. It causes the datanode not able to send heartbeat to NameNode
every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly
decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories
and generating block report, and executes the requests sent by NameNode; Another thread is
for sending heartbeats, block reports, and picking up the requests from NameNode. By having
these two threads, the sending of heartbeats will not get delayed by any slow block report
or slow execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message