hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7343) HDFS smart storage management
Date Thu, 02 Feb 2017 04:14:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849385#comment-15849385
] 

Andrew Wang commented on HDFS-7343:
-----------------------------------

Hi Wei, thanks for posting the new doc. At a high-level I think the scope of this project
is really big. The doc doesn't go into sufficient implementation detail for me to really understand
what's involved in the first development phase. Splitting this into further phases would help.
One possible staging:

* Define triggers and the data required to implement them
* Data collection from HDFS
* Implement the different actions
* The rules syntax and definition

Other comments:

* Could you describe how you will satisfy usecases 4 and 5 in more detail?
* Please describe the complete set of NameNode changes required, particularly the use of LevelDB
and additional state
* The lack of HA means this will be a non-starter for many production deployments, could you
comment on the difficulty of implementing HA? This should really be covered in the initial
design.
* Why are the StorageManager and CacheManager treated as separate components? If the StorageManager
incorporates storage policies, EC, S3, etc, it already seems quite general
* Why are ChangeStoragePolicy and EnforceStoragePolicy separate actions? Is there a usecase
to changing the SP but not moving the data?

Metric collection:
* What is the set of metrics do you plan to collect from HDFS?
* Right now we don't have centralized read statistics which would be obviously useful to implement
a caching policy. Is there a plan to implement this?

Triggers:
* Could you provide a complete description of the trigger syntax? Notably, I don't see a way
to "hash" the time in the examples.
* How often does the SSM wake up to check rules?

Conditions:
* Could you provide a complete list of conditions that are planned?
* How do you plan to implement accessCount over a time range?
* Any other new metrics or information you plan to add to HDFS as part of this work?
* Prefer we use atime or ctime rather than "age", since they're more specific

Object matching:
* Could you provide a complete definition of the object matching syntax?
* Do rules support basic boolean operators like AND, OR, NOT for objects and conditions?
* Is there a reason you chose to implement regex matches rather than file globbing for path
matching? Are these regexs on the full path, or per path component?
* Aren't many of these matches going to require listing the complete filesystem? Or are you
planning to use HDFS inotify?

Actions:
* The "cache" action is underspecified, what cache pool is used?
* How often does the SSM need to poll the NN to get information? How much information each
time? Some triggers might require listing a lot of the namespace.
* Can actions happen concurrently? Is there a way of limiting concurrency?
* Can you run multiple actions in a rule? Is there a syntax for defining "functions"?
* Are there substitutions that can be used to reference the filename, e.g. "${file}"? Same
for DN objects, the diskbalancer needs the DN host:port.

Operational questions:
* Is there an audit log for actions taken by the SSM?
* Is there a way to see when each action started, stopped, and its status?
* How are errors and logs from actions exposed?
* What metrics are exposed by the SSM?
* Why are there configuration options to enable individual actions? Isn't this behavior already
defined by the rules file?
* Why does the SSM need a "dfs.ssm.enabled" config? Is there a usecase for having an SSM service
started, but not enabled?
* Is the rules file dynamically refreshable?
* What do we do if the rules file is malformed? What do we do if there are conflicting rules
or multiple matches?
* dfs.ssm.msg.datanode.interval is described as the polling interval for the NN, typo?
* What happens if multiple SSMs are accidentally started?

> HDFS smart storage management
> -----------------------------
>
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: HDFS-Smart-Storage-Management.pdf, HDFS-Smart-Storage-Management-update.pdf,
move.jpg
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and flexible storage
policy engine considering file attributes, metadata, data temperature, storage type, EC codec,
available hardware capabilities, user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution to provide
smart storage management service in order for convenient, intelligent and effective utilizing
of erasure coding or replicas, HDFS cache facility, HSM offering, and all kinds of tools (balancer,
mover, disk balancer and so on) in a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message