hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4672) Support tiered storage policies
Date Fri, 12 Jul 2013 23:21:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707530#comment-13707530

Suresh Srinivas commented on HDFS-4672:

bq. Today the scope of HDFS-2832 was widened to duplicate this issue. Since the issues are
linked, that was not necessary. 
I disagree. Here is the brief comment I had posted on that jira - https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12539644&commentId=13192326
# Support for heterogeneous storages:
#* DN could support along with disks, other types of storage such as flash etc.
#* Suitable storage can be chosen based on client preference such as need for random reads
# Block report scaling: instead of a single monolithic block report, a smaller block report
per storage becomes possible. This is important with the growth in disk capacity and number
of disks per datanode.
# Better granularity of storage failure handling:
#* DN could just indicate loss of storage and namenode can handle it better since it knows
the list of blocks belonging to a storage. 
#* DN could locally handle storage failures or provide decommissioning of a storage by marking
a storage as ReadOnly.
# Hot pluggability of disks/storages: adding and deleting a storage to a node is simplified.
# Other flexibility: includes future enhancements to balance storages with in a datanode,
balancing the load (number of transceivers) per storage etc and better block placement strategies.

It has brief mentions of the following, that is duplicated in this jira:
# Client preference for writing to storages - automatically means that block placement must
consider storage type etc.
# Support for different storage types in datanode and block reports based on that.
# Awareness of those storage types at the namenode (not for just block placement with various
other benefits)
# Affinity of replicas to a storage type.

Certainly you have elaborated along these points and more implementation details. Does not
mean it is a different jira.
> Support tiered storage policies
> -------------------------------
>                 Key: HDFS-4672
>                 URL: https://issues.apache.org/jira/browse/HDFS-4672
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, hdfs-client, libhdfs, namenode
>            Reporter: Andrew Purtell
> We would like to be able to create certain files on certain storage device classes (e.g.
spinning media, solid state devices, RAM disk, non-volatile memory). HDFS-2832 enables heterogeneous
storage at the DataNode, so the NameNode can gain awareness of what different storage options
are available in the pool and where they are located, but no API is provided for clients or
block placement plugins to perform device aware block placement. We would like to propose
a set of extensions that also have broad applicability to use cases where storage device affinity
is important:
> - Add an enum of generic storage device classes, borrowing from current taxonomy of the
storage industry
> - Augment DataNode volume metadata in storage reports with this enum
> - Extend the namespace so pluggable block policies can be specified on a directory and
storage device class can be tracked in the Inode. Perhaps this could be a larger discussion
on adding support for extended attributes in the HDFS namespace. The Inode should track both
the storage device class hint and the current actual storage device class. FileStatus should
expose this information (or xattrs in general) to clients.
> - Extend the pluggable block policy framework so policies can also consider, and specify,
affinity for a particular storage device class
> - Extend the file creation API to accept a storage device class affinity hint. Such a
hint can be supplied directly as a parameter, or, if we are considering extended attribute
support, then instead as one of a set of xattrs. The hint would be stored in the namespace
and also used by the client to indicate to the NameNode/block placement policy/DataNode constraints
on block placement. Furthermore, if xattrs or device storage class affinity hints are associated
with directories, then the NameNode should provide the storage device affinity hint to the
client in the create API response, so the client can provide the appropriate hint to DataNodes
when writing new blocks.
> - The list of candidate DataNodes for new blocks supplied by the NameNode to clients
should be weighted/sorted by availability of the desired storage device class. 
> - Block replication should consider storage device affinity hints. If a client move()s
a file from a location under a path with affinity hint X to under a path with affinity hint
Y, then all blocks currently residing on media X should be eventually replicated onto media
Y with the then excess replicas on media X deleted.
> - Introduce the concept of degraded path: a path can be degraded if a block placement
policy is forced to abandon a constraint in order to persist the block, when there may not
be available space on the desired device class, or to maintain the minimum necessary replication
factor. This concept is distinct from the corrupt path, where one or more blocks are missing.
Paths in degraded state should be periodically reevaluated for re-replication.
> - The FSShell should be extended with commands for changing the storage device class
hint for a directory or file. 
> - Clients like DistCP which compare metadata should be extended to be aware of the storage
device class hint. For DistCP specifically, there should be an option to ignore the storage
device class hints, enabled by default.
> Suggested semantics:
> - The default storage device class should be the null class, or simply the “default
class”, for all cases where a hint is not available. This should be configurable. hdfs-defaults.xml
could provide the default as spinning media.
> - A storage device class hint should be provided (and is necessary) only when the default
is not sufficient.
> - For backwards compatibility, any FSImage or edit log entry lacking a  storage device
class hint is interpreted as having affinity for the null class.
> - All blocks for a given file share the same storage device class. If the replication
factor for this file is increased the replicas should all be placed on the same storage device
> - If one or more blocks for a given file cannot be placed on the required device class,
then the file is marked as degraded. Files in degraded state should be periodically reevaluated
for re-replication. 
> - A directory and path can only have one storage device affinity hint. If the file inode
specifies a hint, this is used, otherwise we walk up the path until a hint is found and use
that one, otherwise the default storage class is used.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message