hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
Date Fri, 01 Feb 2013 20:46:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569063#comment-13569063
] 

Chris Nauroth commented on HDFS-4462:
-------------------------------------

Hi, Aaron.  The code looks good.  I applied the patch to branch-2 and ran multiple test suites
related to checkpoints and 2NN.

{code}
-  boolean isSameCluster(FSImage si) {
-    return namespaceID == si.getStorage().namespaceID &&
-      clusterID.equals(si.getClusterID()) &&
-      blockpoolID.equals(si.getBlockPoolID());
+  boolean namespaceIdMatches(FSImage si) {
+    return namespaceID == si.getStorage().namespaceID;
   }
{code}

Considering that namespace ID is an integer, whereas cluster ID is based on a GUID, it seems
there is higher likelihood of accidental collision.  Then, {{CheckpointSignature#validateStorageInfo}}
could misidentify a match.  It's still highly unlikely (but non-zero).

I'm wondering if a safer change would be (pseudo-code):

{code}
if namespace ID + cluster ID + blockpool ID are defined on both
  compare all 3 fields
else if only namespace ID is defined on one of them
  compare only namespace ID
{code}

This would keep the logic the same for upgrades between 2 post-federation versions, and just
change the logic for the case of pre-fed -> post-fed.

Or am I being too paranoid?  :-)

                
> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-4462
>                 URL: https://issues.apache.org/jira/browse/HDFS-4462
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an upgrade with
respect to the NN's metadata (i.e. the layout versions are different) and in this case it
will proceed with the checkpoint despite storage signatures not matching precisely if the
BP ID and Cluster ID do match exactly. However, in situations where we're upgrading from versions
of HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints will always fail
with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent checkpoint
fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = CID-0df6ff22-1165-4c7d-9630-429972a7737c
; blockpoolId = BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message