Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 18 Jul 2014 03:20:05 +0000 (UTC)
From: "Ashwin Shankar (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12709668.1398134585630.68670.1405653605877@arcas>
In-Reply-To: <JIRA.12709668.1398134585630@arcas>
References: <JIRA.12709668.1398134585630@arcas>
Subject: [jira] [Commented] (HDFS-6268) Better sorting in
 NetworkTopology#pseudoSortByDistance when no local node is found
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065966#comment-14065966 ] 

Ashwin Shankar commented on HDFS-6268:
--------------------------------------

Hi [~andrew.wang],
You're correct, this behavior is not a regression, we saw this issue before applying your patch too.

bq. The DFSClient should also be failing over to some other replica after a timeout, so I'm surprised your containers are getting stuck.
We run our clusters on Amazon AWS, and they don't differentiate between rack_local and off_switch nodes. offswitch_nodes are considered rack_local as well.For big jobs, when containers go into their LOCALIZING phase, in which they download resources from hdfs, an offswitch datanode which is treated as racklocal gets bombarded by hundreds of tasks. Sometimes the size of resources to be downloaded is large(hashtable in a hive map join) and when an offswitch node gets hit by hundreds of tasks, containers takes more than 10 mins to download,by which time AM times them out and kills them.

bq. Anyway, if you want to add a new config to not use a seed (default false), I'd be happy to review.
 Thanks Andrew ! I've done that and posted a patch in HDFS-6701. This is a little urgent, it would be very helpful if we can get it reviewed and committed quickly.


> Better sorting in NetworkTopology#pseudoSortByDistance when no local node is found
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-6268
>                 URL: https://issues.apache.org/jira/browse/HDFS-6268
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.4.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>            Priority: Minor
>             Fix For: 3.0.0
>
>         Attachments: hdfs-6268-1.patch, hdfs-6268-2.patch, hdfs-6268-3.patch, hdfs-6268-4.patch, hdfs-6268-5.patch, hdfs-6268-branch-2.001.patch
>
>
> In NetworkTopology#pseudoSortByDistance, if no local node is found, it will always place the first rack local node in the list in front.
> This became an issue when a dataset was loaded from a single datanode. This datanode ended up being the first replica for all the blocks in the dataset. When running an Impala query, the non-local reads when reading past a block boundary were all hitting this node, meaning massive load skew.


--
This message was sent by Atlassian JIRA
(v6.2#6252)