hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Lamb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements
Date Thu, 26 Feb 2015 13:37:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338394#comment-14338394
] 

Charles Lamb commented on HDFS-7836:
------------------------------------

Hi [~arpit99],

Thanks for reading over the design doc and commenting on it.

bq. The DataNode can now split block reports per storage directory (post HDFS-2832), controlled
by DFS_BLOCKREPORT_SPLIT_THRESHOLD_KEY. Did you get a chance to try it out and see if it helps?
Splitting reports addresses all of the above. (edit: does not address network bandwidth gains
from compression though)

I think you may mean your work on HDFS-5153, right? If I understand that correctly, it sends
one report per storage. We have seen block reports in the 100MB+ sizes so we suspect that
an even small chunksize than a storage may yield benefits. That said, I am also watching [~daryn]'s
work on HDFS-7435 which addresses a lot of this piece of this Jira's proposal. I think that
once HDFS-7435 is committed, we will make some measurements and see if anything else in the
area of chunking is necessary. As you point out, compression should also help.

bq. Do you have any estimates for startup time overhead due to GCs?

We know of at least one large deployment which experiences a full GC pause during startup.
I'm not sure of the time, but in general, the off-heaping will help with NN throughput just
by reducing the number of objects on the heap.

bq. How does this affect block report processing? We cannot assume DataNodes will sort blocks
by target stripe. Will the NameNode sort received reports or will it acquire+release a lock
per block? If the former, then there should probably be some randomization of order across
threads to avoid unintended serialization e.g. lock convoys.

The idea is that currently, processing a block report requires taking the FSN lock. So this
proposal is two part. First, use better locking semantics so that we don't have to take the
FSN lock. Next, shard the blocksMap structure so that multiple threads can operate concurrently
on that structure. Even if we continue to process BRs under one big happy FSN lock, having
multiple threads operate concurrently will yield benefits. The sharding ("stripes") is along
arbitrary boundaries. For instance, the design doc suggests that it could be striped by doing
blockId % nStripes. nStripes would be configurable to a relatively small number (the dd suggests
4 to 16), and if the modulo calculation is used, then nStripes would be a prime that is roughly
equal to the number of threads available. As long as block report processing per block does
not need to access more than one shard at a time, this will be fine -- multiple threads can
process blocks in parallel. It is a technique that Berkeley DB Java Edition uses for its lock
table to improve concurrency.


> BlockManager Scalability Improvements
> -------------------------------------
>
>                 Key: HDFS-7836
>                 URL: https://issues.apache.org/jira/browse/HDFS-7836
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Charles Lamb
>            Assignee: Charles Lamb
>         Attachments: BlockManagerScalabilityImprovementsDesign.pdf
>
>
> Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message