hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9863) DataNode doesn't log any shutdown info when the process of DataNode exiting
Date Fri, 26 Feb 2016 15:21:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169178#comment-15169178
] 

Kihwal Lee commented on HDFS-9863:
----------------------------------

It's most likely killed by a signal. DN does not have a shutdown hook, so for example, when
it receives SIGTERM, it simply exits.  If it's not the system's OOM killer, someone must have
shut it down. Timing of CMS varies a lot by config. Please move further discussion to a mailing
list as Brahma said.

> DataNode doesn't log any shutdown info when the process of DataNode exiting
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-9863
>                 URL: https://issues.apache.org/jira/browse/HDFS-9863
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Lin Yiqun
>         Attachments: datanode-restart_after.gc.log, datanode-restart_before.gc.log, datanode.log
>
>
> One of my datanodes exited without any shutdown info. 
> {code}
> 2016-02-25 14:46:00,283 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
BP-1942012336-XX.XX.2.191-1406726500544:blk_1730224536_658031130, type=HAS_DOWNSTREAM_IN_PIPELINE
terminating
> 2016-02-25 15:03:55,639 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = XX.XX6032/XX.XX.6.32
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 2.7.1
> {code}
> I think maybe full gc causes this problem, so I looked the datanode gc log. There is
a cms gc but the time of this gc is after than restart datanode time. 
> {code}
> 2016-02-25T15:03:57.930+0800: 2.756: [GC2016-02-25T15:03:57.930+0800: 2.756: [ParNew:
1677824K->24417K(1887488K), 0.0249280 secs] 1677824K->24417K(8178944K), 0.0251010 secs]
[Times: user=0.24 sys=0.07, real=0.02 secs]
> 2016-02-25T15:12:46.498+0800: 531.324: [GC [1 CMS-initial-mark: 0K(6291456K)] 780481K(8178944K),
0.0554170 secs] [Times: user=0.06 sys=0.00, real=0.07 secs]
> 2016-02-25T15:12:46.567+0800: 531.393: [CMS-concurrent-mark-start]
> 2016-02-25T15:12:46.574+0800: 531.400: [CMS-concurrent-mark: 0.006/0.007 secs] [Times:
user=0.07 sys=0.02, real=0.01 secs]
> 2016-02-25T15:12:46.574+0800: 531.400: [CMS-concurrent-preclean-start]
> 2016-02-25T15:12:46.589+0800: 531.415: [CMS-concurrent-preclean: 0.015/0.015 secs] [Times:
user=0.16 sys=0.06, real=0.01 secs]
> {code}
> It seems this is not the main reason. Gc of time before datanode exiting seems normal.
> {code}
> 2016-02-25T14:45:39.743+0800: 5431411.796: [GC2016-02-25T14:45:39.743+0800: 5431411.796:
[ParNew: 1686799K->22696K(1887488K), 0.0385700 secs] 2908579K->1244476K(8178944K) icms_dc=0
, 0.0388280 secs] [Times: user=0.23 sys=0.01, real=0.04 secs]
> {code}
> So it looks confusion. Attach the complete gc logs and datanode log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message