hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
Date Tue, 24 Feb 2009 20:47:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676407#action_12676407

Suresh Srinivas commented on HADOOP-4584:

Currently one of the version of the patches (Feb 10) introduces separate heartbeat thread
without the other changes.

I would like to get consensus on if we need to detect missing files faster than what block
verification can do. Once we agree to that, we can go for long term solution, which should

Block deleted report:
Sending block deleted (much like block received) from datanode to namenode. Currently block
report is the way namenode learns about the deleted blocks. With this change, we can send
block reports less frequently.

Faster block scanning functionality for missing/lingering files:
- We could have a thread that lists the files (without holding a global lock) and reconciles
the blocks on the disk with blocks maintained in the FSDataset. This could be done by deleting
the blocks from FSDataset map under the following conditions:
  - block file or a block meta data file is missing on the disk but exists in FSDataset map
  - block meta data does not match block information (block size and generation stamp) in
FSDataset map
  - block file or block meta data file exists on the disk and does not exist in FSDataset

I was thinking of doing this in a separate jira for long term.

> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>         Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes
to generate a block report. It causes the datanode not able to send heartbeat to NameNode
every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly
decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories
and generating block report, and executes the requests sent by NameNode; Another thread is
for sending heartbeats, block reports, and picking up the requests from NameNode. By having
these two threads, the sending of heartbeats will not get delayed by any slow block report
or slow execution of NameNode requests.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message