Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D4D718B05 for ; Thu, 18 Feb 2016 22:25:19 +0000 (UTC) Received: (qmail 97384 invoked by uid 500); 18 Feb 2016 22:25:18 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 97264 invoked by uid 500); 18 Feb 2016 22:25:18 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 97091 invoked by uid 99); 18 Feb 2016 22:25:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Feb 2016 22:25:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7C0712C14F4 for ; Thu, 18 Feb 2016 22:25:18 +0000 (UTC) Date: Thu, 18 Feb 2016 22:25:18 +0000 (UTC) From: "Tsz Wo Nicholas Sze (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-9818) Correctly handle EC reconstruction work caused by not enough racks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153241#comment-15153241 ] Tsz Wo Nicholas Sze commented on HDFS-9818: ------------------------------------------- - We may also make chooseSource4SimpleReplication static and slightly shorter. {code} private static int chooseSource4SimpleReplication(DatanodeDescriptor[] dds) { final Map> map = new HashMap<>(); for (int i = 0; i < dds.length; i++) { final String rack = dds[i].getNetworkLocation(); List list = map.get(rack); if (list == null) { list = new ArrayList<>(); map.put(rack, list); } list.add(i); } List max = null; for (Map.Entry> entry : map.entrySet()) { if (max == null || entry.getValue().size() > max.size()) { max = entry.getValue(); } } return max.get(0); } {code} > Correctly handle EC reconstruction work caused by not enough racks > ------------------------------------------------------------------ > > Key: HDFS-9818 > URL: https://issues.apache.org/jira/browse/HDFS-9818 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode > Affects Versions: 3.0.0 > Reporter: Takuya Fukudome > Assignee: Jing Zhao > Attachments: HDFS-9818.000.patch, HDFS-9818.001.patch > > > This is reported by [~tfukudom]: > In a system test where 1 of 7 datanode racks were stopped, {{HadoopIllegalArgumentException}} was seen on DataNode side while reconstructing missing EC blocks: > {code} > 2016-02-16 11:09:06,672 WARN datanode.DataNode (ErasureCodingWorker.java:run(482)) - Failed to recover striped block: BP-480558282-172.29.4.13-1453805190696:blk_-9223372036850962784_278270 > org.apache.hadoop.HadoopIllegalArgumentException: Inputs not fully corresponding to erasedIndexes in null places. erasedOrNotToReadIndexes: [1, 2, 6], erasedIndexes: [3] > at org.apache.hadoop.io.erasurecode.rawcoder.RSRawDecoder.doDecode(RSRawDecoder.java:166) > at org.apache.hadoop.io.erasurecode.rawcoder.AbstractRawErasureDecoder.decode(AbstractRawErasureDecoder.java:84) > at org.apache.hadoop.io.erasurecode.rawcoder.RSRawDecoder.decode(RSRawDecoder.java:89) > at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.recoverTargets(ErasureCodingWorker.java:683) > at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:465) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)