hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
Date Tue, 16 Jun 2015 21:36:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588819#comment-14588819

Colin Patrick McCabe commented on HDFS-7923:

This change is important for avoiding cascading failures (aka congestion collapse.)  Currently
when the NN gets too many full block reports at once, the extra block reports slow down the
processing of the existing ones (because storing the large RPCs generates GC activity up to
and including full GCs).  So you get into a negative spiral-- can't process FBRs fast enough?
 Then have some more FBRs which will slow you down even more.  And so on.  Keep in mind with
the previous code, the DN would send its full block report all over again if the NN didn't
respond within some timeout, which could lead to the NN having multiple (large) copies of
the same full block report queued up.  It's true that you could usually avoid these scenarios
by careful configuration and tuning, but this kind of fragile congestion collapse behavior
should not be in the system.  This change is also important for maintaining any sort of reasonable
quality of service on the NN, since otherwise we can get completely flooded with FBRs and
can't do any other work.

> The DataNodes should rate-limit their full block reports by asking the NN on heartbeat
> -----------------------------------------------------------------------------------------------
>                 Key: HDFS-7923
>                 URL: https://issues.apache.org/jira/browse/HDFS-7923
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 2.8.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>             Fix For: 2.8.0
>         Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch,
HDFS-7923.004.patch, HDFS-7923.006.patch, HDFS-7923.007.patch
> The DataNodes should rate-limit their full block reports.  They can do this by first
sending a heartbeat message to the NN with an optional boolean set which requests permission
to send a full block report.  If the NN responds with another optional boolean set, the DN
will send an FBR... if not, it will wait until later.  This can be done compatibly with optional

This message was sent by Atlassian JIRA

View raw message