hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832
Date Wed, 21 Jan 2015 19:59:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286169#comment-14286169
] 

Colin Patrick McCabe commented on HDFS-7575:
--------------------------------------------

This patch does change the layout format.  It changes it from one where storage ID may or
may not be unique to one where it definitely is.

Can you response to the practical points I made above?  I made a few points that nobody has
responded to yet.
* Changing the storage ID during startup basically changes storage ID from being a permanent
identifier to a temporary one... makes persisting this later impossible.  It commits us to
an architecture where block locations can't be persisted.
* With approach #1, we have to carry the burden of the dedupe code forever.
* Approach #1 degrades error handling.  If you somehow end up with two volumes that map to
the same directory, the code silently does the wrong thing.

I would appreciate a response to these.  thanks

> NameNode not handling heartbeats properly after HDFS-2832
> ---------------------------------------------------------
>
>                 Key: HDFS-7575
>                 URL: https://issues.apache.org/jira/browse/HDFS-7575
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.0, 2.5.0, 2.6.0
>            Reporter: Lars Francke
>            Assignee: Arpit Agarwal
>            Priority: Critical
>         Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, HDFS-7575.03.binary.patch,
HDFS-7575.03.patch, HDFS-7575.04.binary.patch, HDFS-7575.04.patch, HDFS-7575.05.binary.patch,
HDFS-7575.05.patch, testUpgrade22via24GeneratesStorageIDs.tgz, testUpgradeFrom22GeneratesStorageIDs.tgz,
testUpgradeFrom24PreservesStorageId.tgz
>
>
> Before HDFS-2832 each DataNode would have a unique storageId which included its IP address.
Since HDFS-2832 the DataNodes have a unique storageId per storage directory which is just
a random UUID.
> They send reports per storage directory in their heartbeats. This heartbeat is processed
on the NameNode in the {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this
would just store the information per Datanode. After the patch though each DataNode can have
multiple different storages so it's stored in a map keyed by the storage Id.
> This works fine for all clusters that have been installed post HDFS-2832 as they get
a UUID for their storage Id. So a DN with 8 drives has a map with 8 different keys. On each
Heartbeat the Map is searched and updated ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
> {code:title=DatanodeStorageInfo}
>   void updateState(StorageReport r) {
>     capacity = r.getCapacity();
>     dfsUsed = r.getDfsUsed();
>     remaining = r.getRemaining();
>     blockPoolUsed = r.getBlockPoolUsed();
>   }
> {code}
> On clusters that were upgraded from a pre HDFS-2832 version though the storage Id has
not been rewritten (at least not on the four clusters I checked) so each directory will have
the exact same storageId. That means there'll be only a single entry in the {{storageMap}}
and it'll be overwritten by a random {{StorageReport}} from the DataNode. This can be seen
in the {{updateState}} method above. This just assigns the capacity from the received report,
instead it should probably sum it up per received heartbeat.
> The Balancer seems to be one of the only things that actually uses this information so
it now considers the utilization of a random drive per DataNode for balancing purposes.
> Things get even worse when a drive has been added or replaced as this will now get a
new storage Id so there'll be two entries in the storageMap. As new drives are usually empty
it skewes the balancers decision in a way that this node will never be considered over-utilized.
> Another problem is that old StorageReports are never removed from the storageMap. So
if I replace a drive and it gets a new storage Id the old one will still be in place and used
for all calculations by the Balancer until a restart of the NameNode.
> I can try providing a patch that does the following:
> * Instead of using a Map I could just store the array we receive or instead of storing
an array sum up the values for reports with the same Id
> * On each heartbeat clear the map (so we know we have up to date information)
> Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message