hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10794) A hadoop cluster needs clock synchronization
Date Sun, 20 Jul 2014 16:32:39 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14067976#comment-14067976
] 

Zhijie Shen commented on HADOOP-10794:
--------------------------------------

bq. Can you explain which components make this assumption, and what happens if it's violated?
So far the only example presented is statistics in YARN.

Not sure about the HDFS part, but the given example in the description is the problem that
happens to MR (see YARN-2251 for more details). For YARN side, at least, NodeHealthStatus#lastHealthReportTime
is acquired at NM, has been reported to RM, and will further be sent to AM. If the clock is
out of sync, RM and AM will get a incorrect time about the NM health status. It may be more
other problematic cases or not. To see what really breaks, we'd better to setup a cluster
with asynchronous clocks.


> A hadoop cluster needs clock synchronization
> --------------------------------------------
>
>                 Key: HADOOP-10794
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10794
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Zhijie Shen
>
> As a distributed system, a hadoop cluster wants the clock on all the participating hosts
synchronized. Otherwise, some problems might happen. For example, in YARN-2251, due to the
clock on the host for the task container falls behind that on the host of the AM container,
the computed elapsed time (the diff between the timestamps produced on two hosts) becomes
negative.
> In YARN-2251, we tried to mask the negative elapsed time. However, we should seek for
a decent long term solution, such as providing mechanism to do and check clock synchronization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message