hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
Date Mon, 05 Dec 2016 22:24:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723538#comment-15723538
] 

Wei-Chiu Chuang edited comment on HDFS-8344 at 12/5/16 10:23 PM:
-----------------------------------------------------------------

I believe I saw a variant of this bug.
Symptom: HBase WAL file's last block went missing. These blocks are 83 bytes according to
NameNode, and cannot be deleted, cannot be recovered via hdfs debug -recoverLease command.

I am aware of at least one scenario where this can happen, when performing volume hot-swap.
I suppose other operations such as datanode decommissioning might also lead to the same symptom.

When this happens, the NameNode has log like this:
{quote}
2016-10-12 16:39:32,885 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
[Lease.  Holder: DFSClient_NONMAPREDUCE_751860335_1, pendingcreates: 3], src=/hbase/WALs/hadoopdev6.example.com,60020,1465915112353-splitting/hadoopdev6.example.com%2C60020%2C1465915112353.null0.1471340470256
2016-10-12 16:39:32,885 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease:
[Lease.  Holder: DFSClient_NONMAPREDUCE_751860335_1, pendingcreates: 3], src=/hbase/WALs/hadoopdev6.example.com,60020,1465915112353-splitting/hadoopdev6.example.com%2C60020%2C1465915112353.null0.1471340470256
from client DFSClient_NONMAPREDUCE_751860335_1


2016-10-12 16:39:32,885 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020:
org.apache.hadoop.hdfs.protocol.ClientProtocol.recoverLease from 192.168.4.86:35276 Call#708331
Retry#0 for RpcKind RPC_PROTOCOL_BUFFER
2016-10-12 16:39:32,885 DEBUG org.apache.hadoop.ipc.Server:  got #708331
2016-10-12 16:39:32,886 WARN BlockStateChange: BLOCK* BlockInfoUnderConstruction.initLeaseRecovery:
No blocks found, lease removed.
2016-10-12 16:39:32,886 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease:
File /hbase/WALs/hadoopdev6.example.com,60020,1465915112353-splitting/hadoopdev6.example.com%2C60020%2C1465915112353.null0.1471340470256
has not been closed. Lease recovery is in progress. RecoveryId = 1099610544470 for block blk_1074528903_1099610239882{blockUCState=UNDER_RECOVERY,
primary
NodeIndex=-1, replicas=[]}
2016-10-12 16:39:32,887 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020:
responding to org.apache.hadoop.hdfs.protocol.ClientProtocol.recoverLease from 192.168.4.86:35276
Call#708331 Retry#0 Wrote 36 bytes.
2016-10-12 16:39:32,887 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020:
responding to org.apache.hadoop.hdfs.protocol.ClientProtocol.recoverLease from 192.168.4.86:35276
Call#708331 Retry#0
2016-10-12 16:39:32,887 DEBUG org.apache.hadoop.ipc.Server: Served: recoverLease queueTime=
0 procesingTime= 2
{quote}

And the lease recover would repeat every one minute non-stop.


was (Author: jojochuang):
I believe I saw a variant of this bug.
Symptom: HBase WAL file's last block went missing. These blocks are 83 bytes according to
NameNode, and cannot be deleted, cannot be recovered via hdfs debug -recoverLease command.

I am aware at least one scenario where this can happen, when performance volume hot-swap.
I suppose other operations such as datanode decommissioning might also lead to the same symptom.

When this happens, the NameNode has log like this:
{quote}
2016-10-12 16:39:32,885 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
[Lease.  Holder: DFSClient_NONMAPREDUCE_751860335_1, pendingcreates: 3], src=/hbase/WALs/hadoopdev6.example.com,60020,1465915112353-splitting/hadoopdev6.example.com%2C60020%2C1465915112353.null0.1471340470256
2016-10-12 16:39:32,885 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease:
[Lease.  Holder: DFSClient_NONMAPREDUCE_751860335_1, pendingcreates: 3], src=/hbase/WALs/hadoopdev6.example.com,60020,1465915112353-splitting/hadoopdev6.example.com%2C60020%2C1465915112353.null0.1471340470256
from client DFSClient_NONMAPREDUCE_751860335_1


2016-10-12 16:39:32,885 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020:
org.apache.hadoop.hdfs.protocol.ClientProtocol.recoverLease from 192.168.4.86:35276 Call#708331
Retry#0 for RpcKind RPC_PROTOCOL_BUFFER
2016-10-12 16:39:32,885 DEBUG org.apache.hadoop.ipc.Server:  got #708331
2016-10-12 16:39:32,886 WARN BlockStateChange: BLOCK* BlockInfoUnderConstruction.initLeaseRecovery:
No blocks found, lease removed.
2016-10-12 16:39:32,886 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease:
File /hbase/WALs/hadoopdev6.example.com,60020,1465915112353-splitting/hadoopdev6.example.com%2C60020%2C1465915112353.null0.1471340470256
has not been closed. Lease recovery is in progress. RecoveryId = 1099610544470 for block blk_1074528903_1099610239882{blockUCState=UNDER_RECOVERY,
primary
NodeIndex=-1, replicas=[]}
2016-10-12 16:39:32,887 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020:
responding to org.apache.hadoop.hdfs.protocol.ClientProtocol.recoverLease from 192.168.4.86:35276
Call#708331 Retry#0 Wrote 36 bytes.
2016-10-12 16:39:32,887 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020:
responding to org.apache.hadoop.hdfs.protocol.ClientProtocol.recoverLease from 192.168.4.86:35276
Call#708331 Retry#0
2016-10-12 16:39:32,887 DEBUG org.apache.hadoop.ipc.Server: Served: recoverLease queueTime=
0 procesingTime= 2
{quote}

And the lease recover would repeat every one minute non-stop.

> NameNode doesn't recover lease for files with missing blocks
> ------------------------------------------------------------
>
>                 Key: HDFS-8344
>                 URL: https://issues.apache.org/jira/browse/HDFS-8344
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.0
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch,
HDFS-8344.05.patch, HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch,
HDFS-8344.10.patch, TestHadoop.java
>
>
> I found another\(?) instance in which the lease is not recovered. This is reproducible
easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply reduces how
long you have to wait
> {code}
>       public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>       public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed so some
of the data has landed on the datanodes) (I'm copying the client code I am using. I generate
a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process
after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1)
> I believe the lease should be recovered and the block should be marked missing. However
this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned cleanly. Although
we knew that the client had crashed, the Namenode never released the leases (even after restarting
the Namenode) (even months afterwards). There are actually several other cases too where we
don't consider what happens if ALL the datanodes die while the file is being written, but
I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message