hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Virajith Jalaparti (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system
Date Thu, 09 Jun 2016 21:47:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323433#comment-15323433
] 

Virajith Jalaparti commented on HDFS-9806:
------------------------------------------

Thanks [~ehiggs] (and, [~PieterReuse] and [~Thomas Demoor]) for getting the PoC working against
S3! It will definitely be interesting to look at the changes you had to make. 

bq. It makes sense to us for there to be a series of commands to attach, detach, and rescan
provided storage from the command line. 

Yes, agreed! Our first cut solution for this is same as what you suggested -- have different
NNs manage different provided storages and have a particular NN manage the local storage.
Using HDFS federation and ViewFs across these NNs can enable the mount functionality you suggested.
A long-term solution can be to add _mount points_ in the NN  -- this would not only allow
operations with a single NN but also allow operations under the mount point without holding
the FSN lock. 

We will update the document with a roadmap on how the implementation can be staged. 

bq. PROVIDED blocks are not stored with the {{INodeFile}}

Are you referring to over-replication of blocks due to read-through caching? If so, that is
addressed by [~chris.douglas]'s comment above. The PROVIDED blocks are treated as any other
blocks. {{INodeFile}} will contain references to these blocks ({{StorageType}} will be marked
as PROVIDED), and will have a _composite_ {{DatanodeStorage}} associated with them as their
location (as mentioned in Section 4.1 in the document). Whenever an attempt is made to get
the locations of these blocks, the composite is resolved to one of the DNs that advertised
this storage. 

bq.  If we want to attach multiple provided storage locations within a single NN, ...

When there are multiple PROVIDED storages, different {{storageId}} s can be used to distinguish
them. The multiplexing/de-multiplexing using the {{storageId}} can be handled inside {{ProvidedStorageMap}},
so as to avoid extensive changes to the {{BlockManager}}.



> Allow HDFS block replicas to be provided by an external storage system
> ----------------------------------------------------------------------
>
>                 Key: HDFS-9806
>                 URL: https://issues.apache.org/jira/browse/HDFS-9806
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Chris Douglas
>         Attachments: HDFS-9806-design.001.pdf
>
>
> In addition to heterogeneous media, many applications work with heterogeneous storage
systems. The guarantees and semantics provided by these systems are often similar, but not
identical to those of [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
Any client accessing multiple storage systems is responsible for reasoning about each system
independently, and must propagate/and renew credentials for each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to immutable
file regions, opaque IDs, or other tokens that represent a consistent view of the data. While
correctness for arbitrary operations requires careful coordination between stores, in practice
we can provide workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message