hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
Date Thu, 17 Sep 2015 01:25:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791410#comment-14791410
] 

Walter Su commented on HDFS-9090:
---------------------------------

bq. The placement policy in the erasure coding branch achieves the goal of spreading the data
across racks.
It still could burden local racks where the 10 storm nodes locate on. Maybe you can customize
a policy. Just override {{chooseTargetInOrder}} with completely {{chooseRandom}}. But it hurts
YARN application's performance. HDFS-4894 or HDFS-7068 could be very helpful but not implemented
yet. Maybe you should take the advice from [~stevel@apache.org] to use ingest nodes.

> Write hot data on few nodes may cause performance issue
> -------------------------------------------------------
>
>                 Key: HDFS-9090
>                 URL: https://issues.apache.org/jira/browse/HDFS-9090
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.3.0
>            Reporter: He Tianyi
>            Assignee: He Tianyi
>
> (I am not sure whether this should be reported as BUG, feel free to modify this)
> Current block placement policy makes best effort to guarantee first replica on local
node whenever possible.
> Consider the following scenario:
> 1. There are 500 datanodes across plenty of racks,
> 2. Raw user action log (just an example) are being written only on 10 nodes, which also
have datanode deployed locally,
> 3. Then, before any balance, all these logs will have at least one replica in 10 nodes,
implying one thirds data read on these log will be served by these 10 nodes if repl factor
is 3, performance suffers.
> I propose to solve this scenario by introducing a configuration entry for client to disable
arbitrary level of write locality.
> Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode the locality
we prefer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message