hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Krogen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13977) NameNode can kill itself if it tries to send too many txns to a QJM simultaneously
Date Thu, 13 Dec 2018 00:21:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719582#comment-16719582
] 

Erik Krogen commented on HDFS-13977:
------------------------------------

It's not clear to me if this bug will still be present after HDFS-10220, we need to investigate
further.

> NameNode can kill itself if it tries to send too many txns to a QJM simultaneously
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-13977
>                 URL: https://issues.apache.org/jira/browse/HDFS-13977
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, qjm
>    Affects Versions: 2.7.7
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>
> h3. Problem & Logs
> We recently encountered an issue on a large cluster (running 2.7.4) in which the NameNode
killed itself because it was unable to communicate with the JNs via QJM. We discovered that
it was the result of the NameNode trying to send a huge batch of over 1 million transactions
to the JNs in a single RPC:
> {code:title=NameNode Logs}
> WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal X.X.X.X:XXXX
failed to
>  write txns 10000000-11153636. Will try to write to this JN again after the next log
roll.
> ...
> WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 1098ms to send
a batch of 1153637 edits (335886611 bytes) to remote journal X.X.X.X:XXXX
> {code}
> {code:title=JournalNode Logs}
> INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8485: readAndProcess from
client X.X.X.X threw exception [java.io.IOException: Requested data length 335886776 is longer
than maximum configured RPC length 67108864.  RPC came from X.X.X.X]
> java.io.IOException: Requested data length 335886776 is longer than maximum configured
RPC length 67108864.  RPC came from X.X.X.X
>         at org.apache.hadoop.ipc.Server$Connection.checkDataLength(Server.java:1610)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1672)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:897)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:753)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:724)
> {code}
> The JournalNodes rejected the RPC because it had a size well over the 64MB default {{ipc.maximum.data.length}}.
> This was triggered by a huge number of files all hitting a hard lease timeout simultaneously,
causing the NN to force-close them all at once. This can be a particularly nasty bug as the
NN will attempt to re-send this same huge RPC on restart, as it loads an fsimage which still
has all of these open files that need to be force-closed.
> h3. Proposed Solution
> To solve this we propose to modify {{EditsDoubleBuffer}} to add a "hard limit" based
on the value of {{ipc.maximum.data.length}}. When {{writeOp()}} or {{writeRaw()}} is called,
first check the size of {{bufCurrent}}. If it exceeds the hard limit, block the writer until
the buffer is flipped and {{bufCurrent}} becomes {{bufReady}}. This gives some self-throttling
to prevent the NameNode from killing itself in this way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message