Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9EAF617CD3 for ; Tue, 6 Jan 2015 02:11:36 +0000 (UTC) Received: (qmail 44029 invoked by uid 500); 6 Jan 2015 02:11:37 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 43978 invoked by uid 500); 6 Jan 2015 02:11:37 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 43963 invoked by uid 99); 6 Jan 2015 02:11:37 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jan 2015 02:11:37 +0000 Date: Tue, 6 Jan 2015 02:11:37 +0000 (UTC) From: "Arpit Agarwal (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265558#comment-14265558 ] Arpit Agarwal edited comment on HDFS-7575 at 1/6/15 2:11 AM: ------------------------------------------------------------- I'm testing a fix and expect to post a patch by tomorrow. It will also fix the storageMap issue. was (Author: arpitagarwal): I'm testing a fix and expect to post a patch by tomorrow. It will also fix for the storageMap issue. > NameNode not handling heartbeats properly after HDFS-2832 > --------------------------------------------------------- > > Key: HDFS-7575 > URL: https://issues.apache.org/jira/browse/HDFS-7575 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.4.0, 2.5.0, 2.6.0 > Reporter: Lars Francke > Assignee: Arpit Agarwal > Priority: Critical > > Before HDFS-2832 each DataNode would have a unique storageId which included its IP address. Since HDFS-2832 the DataNodes have a unique storageId per storage directory which is just a random UUID. > They send reports per storage directory in their heartbeats. This heartbeat is processed on the NameNode in the {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would just store the information per Datanode. After the patch though each DataNode can have multiple different storages so it's stored in a map keyed by the storage Id. > This works fine for all clusters that have been installed post HDFS-2832 as they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 different keys. On each Heartbeat the Map is searched and updated ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}): > {code:title=DatanodeStorageInfo} > void updateState(StorageReport r) { > capacity = r.getCapacity(); > dfsUsed = r.getDfsUsed(); > remaining = r.getRemaining(); > blockPoolUsed = r.getBlockPoolUsed(); > } > {code} > On clusters that were upgraded from a pre HDFS-2832 version though the storage Id has not been rewritten (at least not on the four clusters I checked) so each directory will have the exact same storageId. That means there'll be only a single entry in the {{storageMap}} and it'll be overwritten by a random {{StorageReport}} from the DataNode. This can be seen in the {{updateState}} method above. This just assigns the capacity from the received report, instead it should probably sum it up per received heartbeat. > The Balancer seems to be one of the only things that actually uses this information so it now considers the utilization of a random drive per DataNode for balancing purposes. > Things get even worse when a drive has been added or replaced as this will now get a new storage Id so there'll be two entries in the storageMap. As new drives are usually empty it skewes the balancers decision in a way that this node will never be considered over-utilized. > Another problem is that old StorageReports are never removed from the storageMap. So if I replace a drive and it gets a new storage Id the old one will still be in place and used for all calculations by the Balancer until a restart of the NameNode. > I can try providing a patch that does the following: > * Instead of using a Map I could just store the array we receive or instead of storing an array sum up the values for reports with the same Id > * On each heartbeat clear the map (so we know we have up to date information) > Does that sound sensible? -- This message was sent by Atlassian JIRA (v6.3.4#6332)