hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10999) Introduce separate stats for Replicated and Erasure Coded Blocks apart from the current Aggregated stats
Date Tue, 09 May 2017 18:53:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003294#comment-16003294
] 

Manoj Govindassamy edited comment on HDFS-10999 at 5/9/17 6:52 PM:
-------------------------------------------------------------------

Thanks [~tasanuma0829] and [~andrew.wang] for the detailed review. Attaching v03 patch to
address the following comments and bunch of renamings. While you review this revision, i will
look at more tests that can be added to verify the missing cases.

bq. BlockManagerSafeMode: How about using LongAccumulator for numberOfBytesInFutureBlocks,
too?
Done. numberOfBytesInFutureBlocks is no more needed as it can be derived from bytesInFutureReplicatedBlocks
and bytesInFutureStripedBlocks. numberOfBytesInFutureBlocks is now removed.

bq. CorruptReplicasMap: decrementBlockStat be included in the if statement?Done
    
bq. It seems package private is enough for new methods getCorruptReplicatedBlocksStat and
getCorruptStripedBlocksStat.
 Done 

bq. LowRedundancyBlocks: Looks like corruptReplicatedOneBlocks is same as corruptReplOneBlocks.
How about reusing corruptReplOneBlocks instead of calculating corruptReplicatedOneBlocks?
Done. To be consistent with newly introdcued stats, removed corruptReplOneBlocks and used
corruptReplicatedOneBlocks

bq. InvalidateBlocks: Maybe I should have asked earlier, but if it is not much trouble for
you, how about doing InvalidateBlocks-related work as a follow-on task?
Simplified the code by using two maps instead of 1. Still there are quite a few new changes.
Tried reverting this file change, but it ended up making the patch inconsistent for few stats.
Please take a look at the new version one more time.

bq. ReplicatedBlocksStatsMBean: the equivalent of getCorruptBlocks is now called getCorruptReplicaBlocksStat.
Why not getCorruptBlocksStat instead?
Done. Renamed the methods as suggested.
    
bq. Should we rename getUnderReplicatedBlocksStat to getLowRedundancyReplicatedBlocksStat
or similar to standardize with the EC naming? Since we deprecated getUnderReplicatedBlocks
in favor of getLowRedundancyBlocks, it seems odd to bring this same name back here.
Done. Renamed this to getLowRedundancyBlocksStat.

bq. ECBlockGroupsStatusMBean: the noun is an "EC block group", should we name these e.g. getLowRedundancyECBlockGroupsStat,
getCorruptECBlockGroupsStat, etc.? We could also shorten from "ECBlockGroup" to just "BlockGroup"
if you think the MBean name by itself is sufficient documentation.
Done. Followed the approach used in fsck as suggested by Takanobu and renamed all these methods
to have "ECBlockGroups"

bq. Misc: Any reason you chose to use LongAccumulator rather than LongAdder everywhere?
Done. LongAdder internally uses LongAccumulator. Anyway, switched to LongAdder as it is far
easier to work with than the other.

bq. DFSClient and DistributedFileSystem aren't public APIs, so we don't need to preserve getUnderReplicatedBlocksCount
if these methods are unused. Can move DFSAdmin over to using the new call.
Done.

bq. CorruptReplicasMap: Nit: Please put the new increment and decrement functions next to
each other for clarity
Done

bq. In this block of code, should the decrement be moved inside the isEmpty case? Testing
?
Done. Tests yet to be added for this particular case.

bq. InvalidateBlocks: A lot of the new code is because of the indexing into the new array
in the map and maintaining the separate counts. If we instead add another map, e.g. nodeToBlockGroups,
we could avoid this.
Done.

bq. Can we simplify the limiting code on 297 ..?
Done.

bq. LowRedundancyBlocks: Rename from StripedBlocks to instead StripedBlockGroups?
Done. To be consistent with other places, I used EC prefix instead of Striped, but also included
BlockGroups. 

bq. We're duplicating the corruptReplOneBlocks if statement, let's either move handling corruptReplicatedOneBlocks
out of incrementBlockStat, or also increment corruptReplOneBlocks inside incrementBlockStat.
Same comment for remove
Removed corruptReplOneBlocks as it is handled by corruptReplicatedOneBlocks fully.



was (Author: manojg):
Thanks [~tasanuma0829] and [~andrew.wang] for the detailed review. Attaching v03 patch to
address the following comments and bunch of renamings. While you review this revision, i will
look at more tests that can be added to verify the missing cases.

bq. BlockManagerSafeMode: How about using LongAccumulator for numberOfBytesInFutureBlocks,
too?
Done. numberOfBytesInFutureBlocks is no more needed as it can be derived from bytesInFutureReplicatedBlocks
and bytesInFutureStripedBlocks. numberOfBytesInFutureBlocks is now removed.

bq. CorruptReplicasMap: decrementBlockStat be included in the if statement?Done
    
bq. It seems package private is enough for new methods getCorruptReplicatedBlocksStat and
getCorruptStripedBlocksStat.
 Done 

bq. LowRedundancyBlocks: Looks like corruptReplicatedOneBlocks is same as corruptReplOneBlocks.
How about reusing corruptReplOneBlocks instead of calculating corruptReplicatedOneBlocks?
Done. To be consistent with newly introdcued stats, removed corruptReplOneBlocks and used
corruptReplicatedOneBlocks

bq. InvalidateBlocks: Maybe I should have asked earlier, but if it is not much trouble for
you, how about doing InvalidateBlocks-related work as a follow-on task?
Simplified the code by using two maps instead of 1. Still there are quite a few new changes.
Tried reverting this file change, but it ended up making the patch inconsistent for few stats.
Please take a look at the new version one more time.

bq. ReplicatedBlocksStatsMBean: the equivalent of getCorruptBlocks is now called getCorruptReplicaBlocksStat.
Why not getCorruptBlocksStat instead?
Done. Renamed the methods as suggested.
    
bq. Should we rename getUnderReplicatedBlocksStat to getLowRedundancyReplicatedBlocksStat
or similar to standardize with the EC naming? Since we deprecated getUnderReplicatedBlocks
in favor of getLowRedundancyBlocks, it seems odd to bring this same name back here.
Done. Renamed this to getLowRedundancyBlocksStat.

bq. ECBlockGroupsStatusMBean: the noun is an "EC block group", should we name these e.g. getLowRedundancyECBlockGroupsStat,
getCorruptECBlockGroupsStat, etc.? We could also shorten from "ECBlockGroup" to just "BlockGroup"
if you think the MBean name by itself is sufficient documentation.
Done. Followed the approach used in fsck as suggested by Takanobu and renamed all these methods
to have "ECBlockGroups"

bq. Misc: Any reason you chose to use LongAccumulator rather than LongAdder everywhere?
Done. LongAdder internally uses LongAccumulator. Anyway, switched to LongAdder as it is far
easier to work with than the other.

bq. DFSClient and DistributedFileSystem aren't public APIs, so we don't need to preserve getUnderReplicatedBlocksCount
if these methods are unused. Can move DFSAdmin over to using the new call.
Done.

bq. CorruptReplicasMap: Nit: Please put the new increment and decrement functions next to
each other for clarity
Done

bq. In this block of code, should the decrement be moved inside the isEmpty case? Testing
?
Done. Tests yet to be added for this particular case.

bq. InvalidateBlocks: A lot of the new code is because of the indexing into the new array
in the map and maintaining the separate counts. If we instead add another map, e.g. nodeToBlockGroups,
we could avoid this.
Done.

bq. Can we simplify the limiting code on 297 ..?
DONE

bq. LowRedundancyBlocks: Rename from StripedBlocks to instead StripedBlockGroups?
Done. To be consistent with other places, I used EC prefix instead of Striped, but also included
BlockGroups. 

bq. We're duplicating the corruptReplOneBlocks if statement, let's either move handling corruptReplicatedOneBlocks
out of incrementBlockStat, or also increment corruptReplOneBlocks inside incrementBlockStat.
Same comment for remove
Removed corruptReplOneBlocks as it is handled by corruptReplicatedOneBlocks fully.


> Introduce separate stats for Replicated and Erasure Coded Blocks apart from the current
Aggregated stats
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10999
>                 URL: https://issues.apache.org/jira/browse/HDFS-10999
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Manoj Govindassamy
>              Labels: hdfs-ec-3.0-nice-to-have, supportability
>         Attachments: HDFS-10999.01.patch, HDFS-10999.02.patch, HDFS-10999.03.patch
>
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic term "low
redundancy" to the old-fashioned "under replicated". But this term is still being used in
messages in several places, such as web ui, dfsadmin and fsck. We should probably change them
to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message