hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
Date Thu, 26 Feb 2009 08:03:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676910#action_12676910
] 

Konstantin Shvachko commented on HADOOP-4584:
---------------------------------------------

As I said I propose to isolate in-memory block reports into a separate issue. Does anybody
disagree with that?

As for the heartbeat thread, I would like to propose an alternative to the approach and discuss
pros and cons of the two.

# Now we have a single thread (call it offerServer thread) which does all three operations:
heartbeat with processing command returned from the name-node, blockReceived and blockReport.

# Current Suresh's proposal is to separate heartbeats into a new thread (heartbeat thread),
which also means creating a queue of commands returned from name-node for processing by the
offerServer thread later on.
# My proposal is to separate block report preparation into a new thread (blockReport thread),
which wakes up once an hour and prepares a block report. Once the report is ready the offerService
thread sends it to the name-node.

I think the last proposal (3) may have an advantage over (2) because in (2) we still delay
blockReceived and the processing of commands from the name-node until the block report is
getting composed.

> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes
to generate a block report. It causes the datanode not able to send heartbeat to NameNode
every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly
decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories
and generating block report, and executes the requests sent by NameNode; Another thread is
for sending heartbeats, block reports, and picking up the requests from NameNode. By having
these two threads, the sending of heartbeats will not get delayed by any slow block report
or slow execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message