hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system
Date Tue, 07 Jun 2016 18:16:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319050#comment-15319050
] 

Zhe Zhang commented on HDFS-9806:
---------------------------------

Another thought is, maybe we can leverage caching policies and consistency models from NFS?
Fundamentally, each "small HDFS" is like an NFS client, and the "big external store" is like
the NFS server.

E.g. maybe we can use lease-based locking to prevent conflicting updates to the same subtree.

bq. Initially, writes are not supported through HDFS (read-only). Refresh is an important
case,
Thanks for the clarification. This happens to be the most important use case in our setup.
I think a read-only "small HDFS" should be able to simplify the design. A few additional questions:
# Should NN periodically refresh for the entire mounted subtree? Or fetch new metadata and
data on-demand? Or a mix of on-demand fetch and prefetching? E.g. when application accesses
file {{/data/log1.txt}} and it's a caches miss on small HDFS, proactively fetch all files
under {{/data/}} to small HDFS. If we assume small HDFS has a significantly smaller capacity
than external store, refreshing the entire subtree seems too heavy (network bandwidth usage
and small HDFS capacity)?
# On the on-demand fetching path, the block will be transferred from external store to the
small HDFS DN first, and them from small HDFS DN to application. This actually increases latency
from 1 hop to 2 hops. It's tricky how to reduce this.

> Allow HDFS block replicas to be provided by an external storage system
> ----------------------------------------------------------------------
>
>                 Key: HDFS-9806
>                 URL: https://issues.apache.org/jira/browse/HDFS-9806
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Chris Douglas
>         Attachments: HDFS-9806-design.001.pdf
>
>
> In addition to heterogeneous media, many applications work with heterogeneous storage
systems. The guarantees and semantics provided by these systems are often similar, but not
identical to those of [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
Any client accessing multiple storage systems is responsible for reasoning about each system
independently, and must propagate/and renew credentials for each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to immutable
file regions, opaque IDs, or other tokens that represent a consistent view of the data. While
correctness for arbitrary operations requires careful coordination between stores, in practice
we can provide workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message