hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7960) The full block report should prune zombie storages even if they're not empty
Date Sat, 21 Mar 2015 00:54:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372404#comment-14372404

Colin Patrick McCabe commented on HDFS-7960:

bq. there's a TODO: FIXME, we aren't passing in the BlockReportContext.

Yeah, mea culpa.

bq. processReport doesn't need that last parameter anymore either I think, since the information
is in the BR context.

The last parameter is needed because we want to eliminate zombie storages only after all storages
have been processed, and a single call to {{NameNodeRpcServer#blockReport}} can handle multiple

bq. Is there a need for BR ids to be monotonic increasing? Else using a random number seems
better. I see you do a fixup by checking with the previous ID, but with random this shouldn't
be necessary

I like the idea of monotonic increasing BR ids for two reasons: it makes it easier to see
in the logs what block report came after what block report, and it effectively removes the
(admittedly very, very small) chance of a collision between two subsequent BR IDs.  The monotonic
timer in Linux (or other OS) only gets reset when a node reboots, so even restarting the DN
process will not normally reset the ID.

bq. If you wanted to add comments about all this, BlockReportContext's class javadoc would
be a good choice.

Good idea, I added some comments there.

bq. space after assert


> The full block report should prune zombie storages even if they're not empty
> ----------------------------------------------------------------------------
>                 Key: HDFS-7960
>                 URL: https://issues.apache.org/jira/browse/HDFS-7960
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-7960.002.patch, HDFS-7960.003.patch, HDFS-7960.004.patch
> The full block report should prune zombie storages even if they're not empty.  We have
seen cases in production where zombie storages have not been pruned subsequent to HDFS-7575.
 This could arise any time the NameNode thinks there is a block in some old storage which
is actually not there.  In this case, the block will not show up in the "new" storage (once
old is renamed to new) and the old storage will linger forever as a zombie, even with the
HDFS-7596 fix applied.  This also happens with datanode hotplug, when a drive is removed.
 In this case, an entire storage (volume) goes away but the blocks do not show up in another
storage on the same datanode.

This message was sent by Atlassian JIRA

View raw message