hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
Date Thu, 07 May 2015 17:18:01 GMT
Ravi Prakash created HDFS-8344:

             Summary: NameNode doesn't recover lease for files with missing blocks
                 Key: HDFS-8344
                 URL: https://issues.apache.org/jira/browse/HDFS-8344
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.7.0
            Reporter: Ravi Prakash
            Assignee: Ravi Prakash

I found another\(?) instance in which the lease is not recovered. This is reproducible easily
on a pseudo-distributed single node cluster

# Before you start it helps if you set. This is not necessary, but simply reduces how long
you have to wait
      public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
      public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD;
# Client starts to write a file. (could be less than 1 block, but it hflushed so some of the
data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar
and run it using $ hadoop jar TestHadoop.jar)
# Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after
it has printed "Wrote to the bufferedWriter"
# Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1)

I believe the lease should be recovered and the block should be marked missing. However this
is not happening. The lease is never recovered.
The effect of this bug for us was that nodes could not be decommissioned cleanly. Although
we knew that the client had crashed, the Namenode never released the leases (even after restarting
the Namenode) (even months afterwards). There are actually several other cases too where we
don't consider what happens if ALL the datanodes die while the file is being written, but
I am going to punt on that for another time.

This message was sent by Atlassian JIRA

View raw message