hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7343) HDFS smart storage management
Date Mon, 09 Jan 2017 10:15:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811374#comment-15811374

Wei Zhou commented on HDFS-7343:

Thanks [~anu] for reviewing the design document and the great comments. 
1. I would like to understand the technical trade-offs that was considered in making this
Some methodologies existed in NN to collect metrics and events from DNs, so it's better to
store these data in NN to make SSM stateless as suggested by Andrew. Also, it makes SSM more
stable as these data stored in NN. When SSM node failure happened, we can simply launch another
instance on another node.

2. throttle the number of times a particular rule is executed in a time window? 
Yes, good suggestion, I think we can make it a part of rule (for example, provide a keyword).
For now, it's better to provide a predictable SSM that a rule getting executed when the condition
fulfilled. If a throttle added in rule-engine level then it's hard for users to predict the
execution of the rule. This brings in uncertainty to users. We can implement automatical rule-engine
level throttle in Phase 2.

3. Do we need to store the rules inside Namenode ?
Rule is the core part for SSM to function. For convenient and reliable consideration, it's
better to store it in NN to keep SSM simple and stateless as suggested. Also the size of rule
is very small (pure text) and suppose it should never be a burden to NN.

4. HA support
Yes, good question. We can support HA by many ways, for example, periodically checkpoint the
data to HDFS or store the data in the same way as edit log.

5. but how do you intend to protect this end point?
Yes, if the cluster implements the Kerberos protocol, then web interface, consoles and other
parts of SSM are all works with Kerberos enabled.

6. How do we prevent a run-away rule?
This is a very good question.
First, we provide some verification mechanism when adding some rule. For example, we can give
the user some warning when the candidate files of an action (such as move) exceeding some
certain value. 
Second, the execution state and other info related info can also be showed in the dashboard
or queried. It's convenient for users to track the status and take actions accordingly. It's
also very good to implement a timeout mechanism.

7. On the HDFS client querying SSM before writing, what happens if the SSM is down?
Sorry for not making it clearly. Client queries SSM only once just before creating the file,
SSM does not need to participate in write procedure. So, HDFS client will bypass SSM when
the query fails, then the client goes back to the original working flow. It has almost no
effect on the existing I/O.

I would love to learn how this is working out in real world clusters.
We did some prototypes for POC. Three typical cases implemented with some extent simplification:
# Move data to SSD based on the access count
# Cache data based on the access count
# Archive data based on file's age

The following chart shows the testing result of the first case. The rule is "if a file been
read for more than 2 times within 10 mins then move the file to SSD". As we can see the time
used for read decreases after the rule been executed.

I think we have accidentally omitted reference to our classic balancer here.
Yes, thanks for your reminder.

> HDFS smart storage management
> -----------------------------
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: HDFS-Smart-Storage-Management-update.pdf, HDFS-Smart-Storage-Management.pdf,
> As discussed in HDFS-7285, it would be better to have a comprehensive and flexible storage
policy engine considering file attributes, metadata, data temperature, storage type, EC codec,
available hardware capabilities, user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution to provide
smart storage management service in order for convenient, intelligent and effective utilizing
of erasure coding or replicas, HDFS cache facility, HSM offering, and all kinds of tools (balancer,
mover, disk balancer and so on) in a large cluster.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message