hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
Date Mon, 02 Feb 2009 18:52:01 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Suresh Srinivas updated HADOOP-4584:
------------------------------------

    Attachment: 4584.patch

Current loop in {{Datanode.OfferService()}} performs multiple steps as follows:
1. If in the next heartbeat interval {{sendHeartbeat}}. Process the {{DatanodeCommand}} from
the namenode
2. If there is a block received send {{blockReceived}} request to the namenode
3. If in the next blockreport interval build and send {{blockReport}}. Process the {{DatanodeCommand}}
from the namenode.
4. Wait till the next heartbeat interval or until another block is received
5. go back to 1.

With the changes we have two threads.
Heartbeat Thread:
1. New thread sends heartbeat and receives {{DatanodeCommand}} in response. Queues the command
to an arraylist.

Main thread does the following without the previous heartbeat functionality:
1. If there are commands in the queue, process all of them.
2. If there is a block received send {{blockReceived}} request to the namenode
3. If in the next blockreport interval build and send {{blockReport}}. Process the {{DatanodeCommand}}
from the namenode.
4. If there are no blocks recieved or commands to process wait for 1 second or until another
block is received
5. go back to 1.


Questions:
1. In step 4. should we wait for receiving a command or for receiving another block?
2. In OfferService we process all the commands that are in the queue at once.  Do you see
any issues with it?


> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes
to generate a block report. It causes the datanode not able to send heartbeat to NameNode
every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly
decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories
and generating block report, and executes the requests sent by NameNode; Another thread is
for sending heartbeats, block reports, and picking up the requests from NameNode. By having
these two threads, the sending of heartbeats will not get delayed by any slow block report
or slow execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message