hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "star (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
Date Mon, 20 May 2019 10:08:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843829#comment-16843829
] 

star edited comment on HDFS-12914 at 5/20/19 10:07 AM:
-------------------------------------------------------

[~smarella] how many DNs do you have?  According to the limited logs, I think it is caused
by following case. A high cpu load of SNN delayed the processing of full block report.

 
||DN1...||DN2||
|register|register|
|request Lease| |
|process Report| |
|...|request Lease|
|process Report|{color:#707070}_more than 5 minutes_{color}|
|...|process Report|

 

There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010
are filtered out, right [~smarella]?

In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16
15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full
block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). 

Don't known when a full block lease id is got from server, for there's no info log about it.
I guess it's about 5 minutes before the first failed report, say 15:26:29. 

 


was (Author: starphin):
[~smarella] how many DNs do you have?  According to the limited logs, I think it is caused
by following case. A high cpu load of SNN delayed the processing of full block report.

 
||DN1...||DN2||
|register|register|
|request Lease| |
|process Request| |
|...|request Lease|
|process Request|{color:#707070}_more than 5 minutes_{color}|
|...|process Request|

 

There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010
are filtered out, right [~smarella]?

In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16
15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full
block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). 

Don't known when a full block lease id is got from server, for there's no info log about it.
I guess it's about 5 minutes before the first failed report, say 15:26:29. 

 

> Block report leases cause missing blocks until next report
> ----------------------------------------------------------
>
>                 Key: HDFS-12914
>                 URL: https://issues.apache.org/jira/browse/HDFS-12914
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for conditions such
as "unknown datanode", "not in pending set", "lease has expired", wrong lease id, etc.  Lease
rejection does not throw an exception.  It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}}
and interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes active with
_no blocks_.  A replication storm ensues possibly causing DNs to temporarily go dead (HDFS-12645),
leading to more FBR lease rejections on re-registration.  The cluster will have many "missing
blocks" until the DNs next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message