Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 12 Jul 2013 23:21:51 +0000 (UTC)
From: "Suresh Srinivas (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12641453.1365454457600.43728.1373671311817@arcas>
In-Reply-To: <JIRA.12641453.1365454457600@arcas>
References: <JIRA.12641453.1365454457600@arcas>
Subject: [jira] [Commented] (HDFS-4672) Support tiered storage policies
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HDFS-4672?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13707=
530#comment-13707530 ]=20

Suresh Srinivas commented on HDFS-4672:
---------------------------------------

bq. Today the scope of HDFS-2832 was widened to duplicate this issue. Since=
 the issues are linked, that was not necessary.=20
I disagree. Here is the brief comment I had posted on that jira - https://i=
ssues.apache.org/jira/secure/EditComment!default.jspa?id=3D12539644&comment=
Id=3D13192326
{quote}
# Support for heterogeneous storages:
#* DN could support along with disks, other types of storage such as flash =
etc.
#* Suitable storage can be chosen based on client preference such as need f=
or random reads etc.
# Block report scaling: instead of a single monolithic block report, a smal=
ler block report per storage becomes possible. This is important with the g=
rowth in disk capacity and number of disks per datanode.
# Better granularity of storage failure handling:
#* DN could just indicate loss of storage and namenode can handle it better=
 since it knows the list of blocks belonging to a storage.=20
#* DN could locally handle storage failures or provide decommissioning of a=
 storage by marking a storage as ReadOnly.
# Hot pluggability of disks/storages: adding and deleting a storage to a no=
de is simplified.
# Other flexibility: includes future enhancements to balance storages with =
in a datanode, balancing the load (number of transceivers) per storage etc =
and better block placement strategies.
{quote}

It has brief mentions of the following, that is duplicated in this jira:
# Client preference for writing to storages - automatically means that bloc=
k placement must consider storage type etc.
# Support for different storage types in datanode and block reports based o=
n that.
# Awareness of those storage types at the namenode (not for just block plac=
ement with various other benefits)
# Affinity of replicas to a storage type.

Certainly you have elaborated along these points and more implementation de=
tails. Does not mean it is a different jira.
               =20
> Support tiered storage policies
> -------------------------------
>
>                 Key: HDFS-4672
>                 URL: https://issues.apache.org/jira/browse/HDFS-4672
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, hdfs-client, libhdfs, namenode
>            Reporter: Andrew Purtell
>
> We would like to be able to create certain files on certain storage devic=
e classes (e.g. spinning media, solid state devices, RAM disk, non-volatile=
 memory). HDFS-2832 enables heterogeneous storage at the DataNode, so the N=
ameNode can gain awareness of what different storage options are available =
in the pool and where they are located, but no API is provided for clients =
or block placement plugins to perform device aware block placement. We woul=
d like to propose a set of extensions that also have broad applicability to=
 use cases where storage device affinity is important:
> =20
> - Add an enum of generic storage device classes, borrowing from current t=
axonomy of the storage industry
> =20
> - Augment DataNode volume metadata in storage reports with this enum
> =20
> - Extend the namespace so pluggable block policies can be specified on a =
directory and storage device class can be tracked in the Inode. Perhaps thi=
s could be a larger discussion on adding support for extended attributes in=
 the HDFS namespace. The Inode should track both the storage device class h=
int and the current actual storage device class. FileStatus should expose t=
his information (or xattrs in general) to clients.
> =20
> - Extend the pluggable block policy framework so policies can also consid=
er, and specify, affinity for a particular storage device class
> =20
> - Extend the file creation API to accept a storage device class affinity =
hint. Such a hint can be supplied directly as a parameter, or, if we are co=
nsidering extended attribute support, then instead as one of a set of xattr=
s. The hint would be stored in the namespace and also used by the client to=
 indicate to the NameNode/block placement policy/DataNode constraints on bl=
ock placement. Furthermore, if xattrs or device storage class affinity hint=
s are associated with directories, then the NameNode should provide the sto=
rage device affinity hint to the client in the create API response, so the =
client can provide the appropriate hint to DataNodes when writing new block=
s.
> =20
> - The list of candidate DataNodes for new blocks supplied by the NameNode=
 to clients should be weighted/sorted by availability of the desired storag=
e device class.=20
> =20
> - Block replication should consider storage device affinity hints. If a c=
lient move()s a file from a location under a path with affinity hint X to u=
nder a path with affinity hint Y, then all blocks currently residing on med=
ia X should be eventually replicated onto media Y with the then excess repl=
icas on media X deleted.
> =20
> - Introduce the concept of degraded path: a path can be degraded if a blo=
ck placement policy is forced to abandon a constraint in order to persist t=
he block, when there may not be available space on the desired device class=
, or to maintain the minimum necessary replication factor. This concept is =
distinct from the corrupt path, where one or more blocks are missing. Paths=
 in degraded state should be periodically reevaluated for re-replication.
> =20
> - The FSShell should be extended with commands for changing the storage d=
evice class hint for a directory or file.=20
> =20
> - Clients like DistCP which compare metadata should be extended to be awa=
re of the storage device class hint. For DistCP specifically, there should =
be an option to ignore the storage device class hints, enabled by default.
> =20
> Suggested semantics:
> =20
> - The default storage device class should be the null class, or simply th=
e =E2=80=9Cdefault class=E2=80=9D, for all cases where a hint is not availa=
ble. This should be configurable. hdfs-defaults.xml could provide the defau=
lt as spinning media.
> =20
> - A storage device class hint should be provided (and is necessary) only =
when the default is not sufficient.
> =20
> - For backwards compatibility, any FSImage or edit log entry lacking a  s=
torage device class hint is interpreted as having affinity for the null cla=
ss.
> =20
> - All blocks for a given file share the same storage device class. If the=
 replication factor for this file is increased the replicas should all be p=
laced on the same storage device class.
> =20
> - If one or more blocks for a given file cannot be placed on the required=
 device class, then the file is marked as degraded. Files in degraded state=
 should be periodically reevaluated for re-replication.=20
> =20
> - A directory and path can only have one storage device affinity hint. If=
 the file inode specifies a hint, this is used, otherwise we walk up the pa=
th until a hint is found and use that one, otherwise the default storage cl=
ass is used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrato=
rs
For more information on JIRA, see: http://www.atlassian.com/software/jira