hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7343) HDFS smart storage management
Date Fri, 24 Mar 2017 09:38:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940055#comment-15940055

Kai Zheng commented on HDFS-7343:

Thanks Anoop for the thoughts and good questions!

bq. Tracking of data hotness and movement is at what level? Block level? Or only file level?
We want to support block level, but not sure how would it be useful, since for modern HDFS
data files, they may be mostly single block files. If it's typical that there are many large
files of many blocks of different hotness in typical clusters, we may add to consider block
level support. Wonder if [~andrew.wang] could give some comments about this. Thanks.

bq. HBase, being a user, we will compact our HFiles into one ...
After discussed with Anoop offline, I got the point. HBase itself does fine level cache stuffs
so it won't need the help of HDFS cache, therefore SSM can't help HBase in the cache path.
In other cases, it's possible that in HBase there are cold tables and even cold regions so
in underlying HDFS there could be HDFS blocks of different temperatures, then HDFS HSM could
help. SSM aims to ease HDFS-HSM deployment and usage, so SSM can help HBase in such cases.

For HBase, I thought [~anu] has some considerations. Anu could you cast your points about
this? Thanks!

> HDFS smart storage management
> -----------------------------
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: HDFSSmartStorageManagement-General-20170315.pdf, HDFS-Smart-Storage-Management.pdf,
HDFSSmartStorageManagement-Phase1-20170315.pdf, HDFS-Smart-Storage-Management-update.pdf,
> As discussed in HDFS-7285, it would be better to have a comprehensive and flexible storage
policy engine considering file attributes, metadata, data temperature, storage type, EC codec,
available hardware capabilities, user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution to provide
smart storage management service in order for convenient, intelligent and effective utilizing
of erasure coding or replicas, HDFS cache facility, HSM offering, and all kinds of tools (balancer,
mover, disk balancer and so on) in a large cluster.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message