hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7343) HDFS smart storage management
Date Wed, 19 Oct 2016 05:10:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587704#comment-15587704
] 

Wei Zhou commented on HDFS-7343:
--------------------------------

Thanks [~anu] for reviewing the design document and great comments!
For your comments:
{quote}
1. Is this service attempting to become the answer for all administrative problems of HDFS?
In other words, Is this service is trying to be a catch all service?
I am not able to see what is common between caching and file placement and between running
distcp for remote replication and running balancer and disk balancer.
{quote}
For long run, SSM is going to provide user an end-to-end storage management automation solution.
Any facility can be used in this project towards the solution. The use cases and examples
listed in the document just give examples of the possible scenarios where SSM can be used
and what can SSM do. SSM can help from different angles by using these facilities.
{quote}
But then in the latter parts of the document we drag in distcp, disk balancer and balancer
issues. SSM might be the right place for these in the long run, but my suggestion is to focus
on the core parts of the service and then extend it to other things once the core is stabilized.
{quote}
You are absolutely right, we have to be focused on implementing the core part/module now instead
of involving too much beyond at the same time, it's the basis of other functions.

{quote}
2. Do we need a new rules language – Would you please consider using a language which admins
will already know, for example, if we can write these rules in python or even JavaScript,
you don’t need to invent a whole new language. Every time I have to configure Kerberos rules,
I have to lookup the mini-regex meanings. I am worried that this little rule language will
blow up and once it is in, this is something that we will need to support for the long term.
{quote}
Yes, it's a very good question and we do have thought about it before. We aimed at providing
administrator/user a simple and specific rule language without touching too much besides the
rule logic itself. In fact, a rule is very simple that only have to declare when and which
action to be implied on some objects (can be a file, node, etc.). A general and systematic
language like python or java script maybe too heavy for defining a rule. 

{quote}
3. In this model we are proposing a push model where the datanode and Namenode pushes data
to some kafka endpoint. I would prefer if namenode and datanode was not aware of this service
at all. This service can easily connect to namenode and read almost all the data which is
needed. If you need extra RPC to be added in datanode and namenode that would an option too.
But creating a dependency from namenode and all datanodes in a cluster seems to be something
that you want to do after very careful consideration. If we move to a pull model, you might
not even need kafka service to be running in the initial versions.
{quote}
Good point! This is also a very good way to implement SSM.
If using pull model, the advantage are:
   (1) No dependency on Kafka service, and it’s indeed much easier for development, testing
and deployment.
   (2) Closer relationship with HDFS which may be able to support features that cannot be
done in the model described in the design document. 
The disadvantage are: 
(1) It may have potential performance issue. SSM have to know the messages timely in order
to work effectively. In order to decrease the overhead of getting messages, SSM have to query
NameNodes for the messages on a very high frequency all the time. It’s also very hard for
SSM to query DataNodes one by one to get messages in a large scale cluster. 
(2) It simplifies the process of message collecting and management. If SSM stopped by user
or crashed while the HDFS cluster is still working, then messages from nodes shall be lost
without Kafka, and it’s not friendly for SSM to collect historical data. 
Above, both of the models are workable and we may need more discussion on it. What’s your
opinion?


> HDFS smart storage management
> -----------------------------
>
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: HDFS-Smart-Storage-Management.pdf
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and flexible storage
policy engine considering file attributes, metadata, data temperature, storage type, EC codec,
available hardware capabilities, user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution to provide
smart storage management service in order for convenient, intelligent and effective utilizing
of erasure coding or replicas, HDFS cache facility, HSM offering, and all kinds of tools (balancer,
mover, disk balancer and so on) in a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message