hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832
Date Mon, 12 Jan 2015 20:32:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274124#comment-14274124

Colin Patrick McCabe commented on HDFS-7575:

[~daryn]: I think a new layout version makes sense here.  Basically we are going from a layout
where the storageID might not have been unique, to one where it is.  This is a change in the
VERSION file.  It's nice to have the same guarantees that we usually do (that if the ugprade
fails, you can roll back via the {{previous}} directory, and so forth.)  We could probably
be more "clever" here and optimize this so we didn't have to hardlink the block files, but
the upgrade path is already a little too "clever" and I think this is fine.

Rather than calling the new layout version UPGRADE_GENERATES_STORAGE_IDS, how about calling
it something like UNIQUE_STORAGE_IDS or GUARANTEED_UNIQUE_STORAGE_IDS?  That describes what
the new layout is, rather than what the process of upgrading is, consistent with our other
layout version descriptions.

110    ... = new ClusterVerifier() {
111	      @Override
112	      public void verifyClusterPostUpgrade(MiniDFSCluster cluster) throws IOException
113	        // Verify that a GUID-based storage ID was generated.
114	        final String bpid = cluster.getNamesystem().getBlockPoolId();
115	        StorageReport[] reports =
116	            cluster.getDataNodes().get(0).getFSDataset().getStorageReports(bpid);
117	        assertThat(reports.length, is(1));
118	        final String storageID = reports[0].getStorage().getStorageID();
119	        assertTrue(DatanodeStorage.isValidStorageId(storageID));
120	      }
It seems like this exact code appears in 3 different tests.  We should just make this Verifier
a static object that's created once in the test or something?

+1 once these are addressed.  [~daryn], please take a look if you can... we'd really like
to fix this one.  Thanks, guys

> NameNode not handling heartbeats properly after HDFS-2832
> ---------------------------------------------------------
>                 Key: HDFS-7575
>                 URL: https://issues.apache.org/jira/browse/HDFS-7575
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.0, 2.5.0, 2.6.0
>            Reporter: Lars Francke
>            Assignee: Arpit Agarwal
>            Priority: Critical
>         Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, HDFS-7575.03.binary.patch,
HDFS-7575.03.patch, testUpgrade22via24GeneratesStorageIDs.tgz, testUpgradeFrom22GeneratesStorageIDs.tgz,
> Before HDFS-2832 each DataNode would have a unique storageId which included its IP address.
Since HDFS-2832 the DataNodes have a unique storageId per storage directory which is just
a random UUID.
> They send reports per storage directory in their heartbeats. This heartbeat is processed
on the NameNode in the {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this
would just store the information per Datanode. After the patch though each DataNode can have
multiple different storages so it's stored in a map keyed by the storage Id.
> This works fine for all clusters that have been installed post HDFS-2832 as they get
a UUID for their storage Id. So a DN with 8 drives has a map with 8 different keys. On each
Heartbeat the Map is searched and updated ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
> {code:title=DatanodeStorageInfo}
>   void updateState(StorageReport r) {
>     capacity = r.getCapacity();
>     dfsUsed = r.getDfsUsed();
>     remaining = r.getRemaining();
>     blockPoolUsed = r.getBlockPoolUsed();
>   }
> {code}
> On clusters that were upgraded from a pre HDFS-2832 version though the storage Id has
not been rewritten (at least not on the four clusters I checked) so each directory will have
the exact same storageId. That means there'll be only a single entry in the {{storageMap}}
and it'll be overwritten by a random {{StorageReport}} from the DataNode. This can be seen
in the {{updateState}} method above. This just assigns the capacity from the received report,
instead it should probably sum it up per received heartbeat.
> The Balancer seems to be one of the only things that actually uses this information so
it now considers the utilization of a random drive per DataNode for balancing purposes.
> Things get even worse when a drive has been added or replaced as this will now get a
new storage Id so there'll be two entries in the storageMap. As new drives are usually empty
it skewes the balancers decision in a way that this node will never be considered over-utilized.
> Another problem is that old StorageReports are never removed from the storageMap. So
if I replace a drive and it gets a new storage Id the old one will still be in place and used
for all calculations by the Balancer until a restart of the NameNode.
> I can try providing a patch that does the following:
> * Instead of using a Map I could just store the array we receive or instead of storing
an array sum up the values for reports with the same Id
> * On each heartbeat clear the map (so we know we have up to date information)
> Does that sound sensible?

This message was sent by Atlassian JIRA

View raw message