Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Tue, 9 Sep 2014 16:07:29 +0000 (UTC)
From: "Daryn Sharp (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12733370.1407804402000.28404.1410278849133@Atlassian.JIRA>
In-Reply-To: <JIRA.12733370.1407804402000@Atlassian.JIRA>
References: <JIRA.12733370.1407804402000@Atlassian.JIRA>
 <JIRA.12733370.1407804402474@arcas>
Subject: [jira] [Commented] (HDFS-6840) Clients are always sent to the same
 datanode when read is off rack
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127156#comment-14127156 ] 

Daryn Sharp commented on HDFS-6840:
-----------------------------------

In addition to Jason's comment, I'm mildly concerned with the tests assuming and hardcoding the ordering based on seed.  Presumably the jdk could change how the seeding works at anytime which would cause test failures.  Note that a few months ago I saw a jdk bug about how java's randomness isn't very random at all so it's possible the ordering could change in the near future.

> Clients are always sent to the same datanode when read is off rack
> ------------------------------------------------------------------
>
>                 Key: HDFS-6840
>                 URL: https://issues.apache.org/jira/browse/HDFS-6840
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Andrew Wang
>            Priority: Critical
>         Attachments: hdfs-6840.001.patch, hdfs-6840.002.patch
>
>
> After HDFS-6268 the sorting order of block locations is deterministic for a given block and locality level (e.g.: local, rack. off-rack), so off-rack clients all see the same datanode for the same block.  This leads to very poor behavior in distributed cache localization and other scenarios where many clients all want the same block data at approximately the same time.  The one datanode is crushed by the load while the other replicas only handle local and rack-local requests.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)