hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7343) HDFS smart storage management
Date Tue, 10 Jan 2017 02:37:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813615#comment-15813615
] 

Anu Engineer commented on HDFS-7343:
------------------------------------

bq.  Also, it makes SSM more stable as these data stored in NN. When SSM node failure happened,
we can simply launch another instance on another node.
I do see now where this thought came from. However I think that SSM should be able to stand
independent and not rely on Namenode. Here are some reasons I can think of.

1. SSM can be implemented with no changes to NN, that makes for easier and faster development

2. No added complexity in Namenode 
3. Moving State from SSM to Namenode just makes SSM simpler, but makes Namenode to that degree
more complicated. 

So while I have a better understanding of the motivation, making NN store rules and metrics
which are needed by SSM feels like a wrong choice. As I said earlier, if you want to run this
in other scenarios then this dependency on NN makes it hard. For example, someone is running
SSM in cloud and is using a cloud native file system instead of Namenode. 

bq. This brings in uncertainty to users. We can implement automatical rule-engine level throttle
in Phase 2.
if a rule is misbehaving then NN will become slow, thus it brings uncertainty. But I am fine
with the choice of postponing this to a later stage. Would you be able to count how many times
a particular rule was triggered in a given time window ? That would be useful to debug this
issue.

bq. Rule is the core part for SSM to function. For convenient and reliable consideration,
it's better to store it in NN to keep SSM simple and stateless as suggested.
Rules are core part of SSM.So let us store them in SSM instead of storing it in NN, or feel
free to store it as a file on HDFS. Modifying Namenode to store config of some other service
will make Namenode a dumping ground of config for all other services.

bq. Yes, good question. We can support HA by many ways, for example, periodically checkpoint
the data to HDFS or store the data in the same way as edit log.

Sorry, I am not unable to understand this response clearly. Are you now saying we will support
HA  ? 

bq. First, we provide some verification mechanism when adding some rule. For example, we can
give the user some warning when the candidate files of an action (such as move) exceeding
some certain value. 

This is a classic time of check to time of use problem. When the rule gets written may be
there is no issue, but as the file count increases this becomes a problem.

bq. Second, the execution state and other info related info can also be showed in the dashboard
or queried. It's convenient for users to track the status and take actions accordingly. It's
also very good to implement a timeout mechanism.

Agreed, but now have we not introduced the uncertainty issue back into the solution ? I thought
we did not want to restrict the number of times a rule fires since that would introduce uncertainty.

bq. HDFS client will bypass SSM when the query fails, then the client goes back to the original
working flow. It has almost no effect on the existing I/O.
So then the SSM rules are violated ? How does it deal with that issue ? since you have to
deal with SSM being down why have the HDFS client even talk to SSM in an I/O path ? Why not
just rely on background SSM logic and rely on the rules doing the right thing ? 


Thanks for sharing the graph, I appreciate it.











> HDFS smart storage management
> -----------------------------
>
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: HDFS-Smart-Storage-Management-update.pdf, HDFS-Smart-Storage-Management.pdf,
move.jpg
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and flexible storage
policy engine considering file attributes, metadata, data temperature, storage type, EC codec,
available hardware capabilities, user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution to provide
smart storage management service in order for convenient, intelligent and effective utilizing
of erasure coding or replicas, HDFS cache facility, HSM offering, and all kinds of tools (balancer,
mover, disk balancer and so on) in a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message