Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 56C1611F70 for ; Fri, 18 Jul 2014 03:20:06 +0000 (UTC) Received: (qmail 99025 invoked by uid 500); 18 Jul 2014 03:20:06 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 98963 invoked by uid 500); 18 Jul 2014 03:20:06 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 98934 invoked by uid 99); 18 Jul 2014 03:20:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jul 2014 03:20:05 +0000 Date: Fri, 18 Jul 2014 03:20:05 +0000 (UTC) From: "Ashwin Shankar (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6268) Better sorting in NetworkTopology#pseudoSortByDistance when no local node is found MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065966#comment-14065966 ] Ashwin Shankar commented on HDFS-6268: -------------------------------------- Hi [~andrew.wang], You're correct, this behavior is not a regression, we saw this issue before applying your patch too. bq. The DFSClient should also be failing over to some other replica after a timeout, so I'm surprised your containers are getting stuck. We run our clusters on Amazon AWS, and they don't differentiate between rack_local and off_switch nodes. offswitch_nodes are considered rack_local as well.For big jobs, when containers go into their LOCALIZING phase, in which they download resources from hdfs, an offswitch datanode which is treated as racklocal gets bombarded by hundreds of tasks. Sometimes the size of resources to be downloaded is large(hashtable in a hive map join) and when an offswitch node gets hit by hundreds of tasks, containers takes more than 10 mins to download,by which time AM times them out and kills them. bq. Anyway, if you want to add a new config to not use a seed (default false), I'd be happy to review. Thanks Andrew ! I've done that and posted a patch in HDFS-6701. This is a little urgent, it would be very helpful if we can get it reviewed and committed quickly. > Better sorting in NetworkTopology#pseudoSortByDistance when no local node is found > ---------------------------------------------------------------------------------- > > Key: HDFS-6268 > URL: https://issues.apache.org/jira/browse/HDFS-6268 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.4.0 > Reporter: Andrew Wang > Assignee: Andrew Wang > Priority: Minor > Fix For: 3.0.0 > > Attachments: hdfs-6268-1.patch, hdfs-6268-2.patch, hdfs-6268-3.patch, hdfs-6268-4.patch, hdfs-6268-5.patch, hdfs-6268-branch-2.001.patch > > > In NetworkTopology#pseudoSortByDistance, if no local node is found, it will always place the first rack local node in the list in front. > This became an issue when a dataset was loaded from a single datanode. This datanode ended up being the first replica for all the blocks in the dataset. When running an Impala query, the non-local reads when reading past a block boundary were all hitting this node, meaning massive load skew. -- This message was sent by Atlassian JIRA (v6.2#6252)