hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7343) HDFS smart storage management
Date Mon, 24 Oct 2016 23:56:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603629#comment-15603629

Andrew Wang commented on HDFS-7343:

Hi Wei Zhou, thanks for the replies, inline:

bq. How about get the file range data by hook into DataXceiver or DFSInputStream?

The issue here is that DFSInputStream is client-side, and clients might lie to game the system.

It's also more challenging to ingest client metrics, since you can't poll a client. We'd probably
have to build some DN functionality to aggregate client metrics.

bq. Yes, you are right, so we won't make it a black box...

I'm looking forward to the next revision of the design doc :)

bq. The server may not have enough disk slots to accommodate more HDDs.

Sure, but you could also buy more servers to get more disk slots or RAM capacity. I like to
look at this at the cluster-level.

bq. HDD performance...decays dramatically compared with SSD... The 1.36X throughput and 1/3
latency over HDD are measured under a proper load for HDD. The improvement can be much higher.

Yea, it's tricky to setup the correct baseline for these comparisons. If the HDDs are overloaded,
then SSD looks comparatively better, but no one who cares about latency would overload their

bq. SSM tries to take actions to optimize the performance of the workload because SSM can
not know whether it's profitable or not in advance. If turned out to be no improvement, suppose
the action taken won't hurts the performance as well or the overhead is acceptable.

This will make it difficult for end users since they want reliable performance, particularly
when it comes to SLOs. Even if a job goes at SSD-level performance 95% of the time, the 5%
it doesn't will violate the SLO. This means the SLO still needs to be set to HDD-level performance,
meaning that we spent extra money for SSDs for the cluster but weren't able to improve the
guarantees to end users.

If SSM guesses wrong, it might even do worse than HDD-level performance due to additional
disk and network I/O from data movement.

bq. In your opinion, is there anything else for SSM to do to improve write performance?

No, not directly. If you're already looking at write performance, that's fine with me.

bq. <Kafka>

I don't have much else to add, though I'd encourage you to think about how far we can get
with a stateless system (possibly by pushing more work into the NN and DN). As soon as a service
is stateful, high-availability becomes harder. This relates to SLOs as well; if a service
isn't highly-available, then users can't set SLOs based on the service being up.

> HDFS smart storage management
> -----------------------------
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: HDFS-Smart-Storage-Management.pdf
> As discussed in HDFS-7285, it would be better to have a comprehensive and flexible storage
policy engine considering file attributes, metadata, data temperature, storage type, EC codec,
available hardware capabilities, user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution to provide
smart storage management service in order for convenient, intelligent and effective utilizing
of erasure coding or replicas, HDFS cache facility, HSM offering, and all kinds of tools (balancer,
mover, disk balancer and so on) in a large cluster.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message