hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gera Shegalov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8182) Implement topology-aware CDN-style caching
Date Tue, 21 Apr 2015 06:41:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504444#comment-14504444

Gera Shegalov commented on HDFS-8182:

Hi Andrew,

I think the said block placement policy works fine for data whose usage we know a priori such
as binaries in YARN-1492 Shared Cache (few terabytes in our case), MR/Spark staging directories,
etc. For such cases we/frameworks already set a high replication factor. And the solution
with rf=#racks is already good enough. Except for the replication speed vs YARN scheduling
race, which would be eliminated with the approach proposed in this JIRA. 

In some cases we have no a priori knowledge. The most prominent ones are some primary or temporary
files are used as the build input of a hash join in an ad-hoc manner. Having a solution that
works transparently irrespective of specified replication factor is a win.

Another drawback of a block-placement based solution (besides currently being global, not
per file) is that it's not elastic, and is oblivious of the data temperature. I think this
JIRA would cover both families of cases above well.

> Implement topology-aware CDN-style caching
> ------------------------------------------
>                 Key: HDFS-8182
>                 URL: https://issues.apache.org/jira/browse/HDFS-8182
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.6.0
>            Reporter: Gera Shegalov
> To scale reads of hot blocks in large clusters, it would be beneficial if we could read
a block across the ToR switches only once. Example scenarios are localization of binaries,
MR distributed cache files for map-side joins and similar. There are multiple layers where
this could be implemented (YARN service or individual apps such as MR) but I believe it is
best done in HDFS or even common FileSystem to support as many use cases as possible. 
> The life cycle could look like this e.g. for the YARN localization scenario:
> 1. inputStream = fs.open(path, ..., CACHE_IN_RACK)
> 2. instead of reading from a remote DN directly, NN tells the client to read via the
local DN1 and the DN1 creates a replica of each block.
> When the next localizer on DN2 in the same rack starts it will learn from NN about the
replica in DN1 and the client will read from DN1 using the conventional path.
> When the application ends the AM or NM's can instruct the NN in a fadvise DONTNEED style,
it can start telling DN's to discard extraneous replica.

This message was sent by Atlassian JIRA

View raw message