hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santosh Marella (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
Date Mon, 20 May 2019 09:48:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843647#comment-16843647
] 

Santosh Marella edited comment on HDFS-12914 at 5/20/19 9:47 AM:
-----------------------------------------------------------------

[~jojochuang] - We might have hit this issue recently and some initial investigation seems
to lead to this issue described by [~daryn] in the description. I'm new to this area, but
from what I have learnt so far, it seems that HDFS-14314 fixes a related, but slightly different
scenario. Please see below.

 

We have a DN that has 12 disks. When we restarted a standby NN, the DN registers itself, gets
a lease for FBR and sends a FBR containing 12 reports, one for each disk. However, only 3
of them got processed and the 9 aren't processed, as the lease had expired before processing
the 4th report. This essentially meant the FBR was partially processed, and, in our case,
this *might* be one of the reasons it's taking too long for the NN to come out of safe mode
(as the safe block count is taking too long to reach the threshold due to partial FBR processing).

 

Some raw notes that I've collected investigating this issue. Sorry for being verbose, but
hope it helps everyone. We are on 2.9.2 version of Hadoop.
----
Dug through the logs and observed this for a DN for which only 3 out of 12 reports were processed.
The DN registered itself with the NN, then sent a FBR that contained 12 reports (one for each
disk). The NN processed 3 of them (indicated by *processing first storage report*  and *processing
time* entries in the log statements). However, for the 9 remaining reports, it prints *lease
xxx is not valid for DN* messages.  

 
{code:java}
2019-05-16 15:15:35,028 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode:
from DatanodeRegistration(10.54.63.120:50010, datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d,
infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005)
storage 7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:15:35,028 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
Registered DN 7493442a-c552-43f4-b6bd-728be292f66d (10.54.63.120:50010).
2019-05-16 15:31:11,578 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: Processing
first storage report for DS-3e8d8352-ecc9-45cb-a39b-86f10d8aa386 from datanode 7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:31:11,941 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: from
storage DS-3e8d8352-ecc9-45cb-a39b-86f10d8aa386 node DatanodeRegistration(10.54.63.120:50010,
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005),
blocks: 12690, hasStaleStorage: true, processing time: 363 msecs, invalidatedBlocks: 0
2019-05-16 15:31:17,496 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: Processing
first storage report for DS-600c8ca1-3f99-41fc-a784-f663b928fe21 from datanode 7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:31:17,851 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: from
storage DS-600c8ca1-3f99-41fc-a784-f663b928fe21 node DatanodeRegistration(10.54.63.120:50010,
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005),
blocks: 12670, hasStaleStorage: true, processing time: 355 msecs, invalidatedBlocks: 0
2019-05-16 15:31:23,465 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: Processing
first storage report for DS-3e7dc4c5-ab4e-40d1-8f32-64fe28081f94 from datanode 7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:31:23,821 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: from
storage DS-3e7dc4c5-ab4e-40d1-8f32-64fe28081f94 node DatanodeRegistration(10.54.63.120:50010,
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005),
blocks: 12698, hasStaleStorage: true, processing time: 356 msecs, invalidatedBlocks: 0
2019-05-16 15:31:29,419 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
Removing expired block report lease 0xfd013f0084d0ed2d for DN 7493442a-c552-43f4-b6bd-728be292f66d.
2019-05-16 15:31:29,419 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the lease has expired.
2019-05-16 15:31:35,891 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:31:46,775 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:31:57,719 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:08,791 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:19,706 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:30,660 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:42,248 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:52,921 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.{code}
On the DN side, we can see that it sent 12 reports in the FBR in a single RPC:
{code:java}
15:33:03,692 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block
report 0xb4fb52822c9e3f03,  containing 12 storage report(s), of which we sent 12. The reports
had 151730 total blocks and used 1 RPC(s). This took 37 msec to generate and 208787 msecs
for RPC and NN processing. Got back no commands.
{code}
Looked at the code and it indeed seems to be the case that the lease has expired before all
the 12 reports were processed.
{code:java}
@Override // DatanodeProtocol
public DatanodeCommand blockReport(final DatanodeRegistration nodeReg,
      String poolId, final StorageBlockReport[] reports,
      final BlockReportContext context) throws IOException {
  checkNNStartup();
  verifyRequest(nodeReg);
  if(blockStateChangeLog.isDebugEnabled()) {
    blockStateChangeLog.debug("*BLOCK* NameNode.blockReport: "
         + "from " + nodeReg + ", reports.length=" + reports.length);
  }
  final BlockManager bm = namesystem.getBlockManager(); 
  boolean noStaleStorages = false;
  for (int r = 0; r < reports.length; r++) {
    final BlockListAsLongs blocks = reports[r].getBlocks();
    //
    // BlockManager.processReport accumulates information of prior calls
    // for the same node and storage, so the value returned by the last
    // call of this loop is the final updated value for noStaleStorage.
    //
    final int index = r;
    noStaleStorages = bm.runBlockOp(new Callable<Boolean>() {
      @Override
      public Boolean call() throws IOException {
        return bm.processReport(nodeReg, reports[index].getStorage(),
            blocks, context);
      }
    });
    metrics.incrStorageBlockReportOps();
  }
  bm.removeBRLeaseIfNeeded(nodeReg, context);
{code}
The current behavior is that each of the 12 reports are processed in a loop and before attempting
to process a report, the validity of the lease is checked. If the lease expired, that report
is not processed.

 
{code:java}
public boolean processReport(final DatanodeID nodeID,
    final DatanodeStorage storage,
    final BlockListAsLongs newReport,
    BlockReportContext context) throws IOException {
  namesystem.writeLock();
  final long startTime = Time.monotonicNow(); //after acquiring write lock
  final long endTime;
  DatanodeDescriptor node;
  Collection<Block> invalidatedBlocks = Collections.emptyList();
  String strBlockReportId =
      context != null ? Long.toHexString(context.getReportId()) : "";


.....

if (context != null) {
  if (!blockReportLeaseManager.checkLease(node, startTime,
        context.getLeaseId())) {
    return false; <------- ** returns here without processing the FBR below **
  }
}

if (storageInfo.getBlockReportCount() == 0) {
  // The first block report can be processed a lot more efficiently than
  // ordinary block reports.  This shortens restart times.
  blockLog.info("BLOCK* processReport 0x{}: Processing first "
          + "storage report for {} from datanode {}",
      strBlockReportId,
      storageInfo.getStorageID(),
      nodeID.getDatanodeUuid());
  processFirstBlockReport(storageInfo, newReport);
} else {
  invalidatedBlocks = processReport(storageInfo, newReport, context);
}
{code}


was (Author: smarella):
[~jojochuang] - We might have hit this issue recently and some initial investigation seems
to lead to this issue described by [~daryn] in the description. I'm new to this area, but
from what I have learnt so far, it seems that HDFS-14314 fixes a related, but slightly different
scenario. Please see below.

 

We have a DN that has 12 disks. When we restarted a standby NN, the DN registers itself, gets
a lease for FBR and sends a FBR containing 12 reports, one for each disk. However, only 3
of them got processed and the 9 aren't processed, as the lease had expired before processing
the 4th report. This essentially meant the FBR was partially processed, and, in our case,
this *might* be one of the reasons it's taking too long for the NN to come out of safe mode
(as the safe block count is taking too long to reach the threshold due to partial FBR processing).

 

Some raw notes that I've collected investigating this issue. Sorry for being verbose, but
hope it helps everyone. We are on 2.9.2 version of Hadoop.
----
Dug through the logs and observed this for a DN for which only 3 out of 12 reports were processed.
The DN registered itself with the NN, then sent a FBR that contained 12 reports (one for each
disk). The NN processed 3 of them (indicated by *processing first storage report*  and *processing
time* entries in the log statements). However, for the 9 remaining reports, it prints *lease
xxx is not valid for DN* messages.  

 
{code:java}
2019-05-16 15:15:35,028 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode:
from DatanodeRegistration(10.54.63.120:50010, datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d,
infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005)
storage 7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:15:35,028 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
Registered DN 7493442a-c552-43f4-b6bd-728be292f66d (10.54.63.120:50010).
2019-05-16 15:31:11,578 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: Processing
first storage report for DS-3e8d8352-ecc9-45cb-a39b-86f10d8aa386 from datanode 7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:31:11,941 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: from
storage DS-3e8d8352-ecc9-45cb-a39b-86f10d8aa386 node DatanodeRegistration(10.54.63.120:50010,
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005),
blocks: 12690, hasStaleStorage: true, processing time: 363 msecs, invalidatedBlocks: 0
2019-05-16 15:31:17,496 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: Processing
first storage report for DS-600c8ca1-3f99-41fc-a784-f663b928fe21 from datanode 7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:31:17,851 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: from
storage DS-600c8ca1-3f99-41fc-a784-f663b928fe21 node DatanodeRegistration(10.54.63.120:50010,
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005),
blocks: 12670, hasStaleStorage: true, processing time: 355 msecs, invalidatedBlocks: 0
2019-05-16 15:31:23,465 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: Processing
first storage report for DS-3e7dc4c5-ab4e-40d1-8f32-64fe28081f94 from datanode 7493442a-c552-43f4-b6bd-728be292f66d
2019-05-16 15:31:23,821 INFO BlockStateChange: BLOCK* processReport 0xb4fb52822c9e3f03: from
storage DS-3e7dc4c5-ab4e-40d1-8f32-64fe28081f94 node DatanodeRegistration(10.54.63.120:50010,
datanodeUuid=7493442a-c552-43f4-b6bd-728be292f66d, infoPort=50075, infoSecurePort=0, ipcPort=50020,
storageInfo=lv=-57;cid=CID-f4a0a2ae-9e3d-41d5-b98a-e0e77ed0249b;nsid=682930173;c=1406912757005),
blocks: 12698, hasStaleStorage: true, processing time: 356 msecs, invalidatedBlocks: 0
2019-05-16 15:31:29,419 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
Removing expired block report lease 0xfd013f0084d0ed2d for DN 7493442a-c552-43f4-b6bd-728be292f66d.
2019-05-16 15:31:29,419 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the lease has expired.
2019-05-16 15:31:35,891 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:31:46,775 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:31:57,719 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:08,791 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:19,706 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:30,660 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:42,248 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.
2019-05-16 15:32:52,921 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockReportLeaseManager:
BR lease 0xfd013f0084d0ed2d is not valid for DN 7493442a-c552-43f4-b6bd-728be292f66d, because
the DN is not in the pending set.{code}
On the DN side, we can see that it sent 12 reports in the FBR in a single RPC:
{code:java}
smarella[7]smf1-cjn-22-sr1[hadoop-tst-worker] hadoop $ grep "0xb4fb52822c9e3f03" hadoop-hdfs-datanode-smf1-cjn-22-sr1.prod.twitter.com.log*
hadoop-hdfs-datanode-smf1-cjn-22-sr1.prod.twitter.com.log:2019-05-16 15:33:03,692 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Successfully sent block report 0xb4fb52822c9e3f03,  containing 12 storage report(s), of which
we sent 12. The reports had 151730 total blocks and used 1 RPC(s). This took 37 msec to generate
and 208787 msecs for RPC and NN processing. Got back no commands.
{code}
Looked at the code and it indeed seems to be the case that the lease has expired before all
the 12 reports were processed.
{code:java}
@Override // DatanodeProtocol
public DatanodeCommand blockReport(final DatanodeRegistration nodeReg,
      String poolId, final StorageBlockReport[] reports,
      final BlockReportContext context) throws IOException {
  checkNNStartup();
  verifyRequest(nodeReg);
  if(blockStateChangeLog.isDebugEnabled()) {
    blockStateChangeLog.debug("*BLOCK* NameNode.blockReport: "
         + "from " + nodeReg + ", reports.length=" + reports.length);
  }
  final BlockManager bm = namesystem.getBlockManager(); 
  boolean noStaleStorages = false;
  for (int r = 0; r < reports.length; r++) {
    final BlockListAsLongs blocks = reports[r].getBlocks();
    //
    // BlockManager.processReport accumulates information of prior calls
    // for the same node and storage, so the value returned by the last
    // call of this loop is the final updated value for noStaleStorage.
    //
    final int index = r;
    noStaleStorages = bm.runBlockOp(new Callable<Boolean>() {
      @Override
      public Boolean call() throws IOException {
        return bm.processReport(nodeReg, reports[index].getStorage(),
            blocks, context);
      }
    });
    metrics.incrStorageBlockReportOps();
  }
  bm.removeBRLeaseIfNeeded(nodeReg, context);
{code}
The current behavior is that each of the 12 reports are processed in a loop and before attempting
to process a report, the validity of the lease is checked. If the lease expired, that report
is not processed.

 
{code:java}
public boolean processReport(final DatanodeID nodeID,
    final DatanodeStorage storage,
    final BlockListAsLongs newReport,
    BlockReportContext context) throws IOException {
  namesystem.writeLock();
  final long startTime = Time.monotonicNow(); //after acquiring write lock
  final long endTime;
  DatanodeDescriptor node;
  Collection<Block> invalidatedBlocks = Collections.emptyList();
  String strBlockReportId =
      context != null ? Long.toHexString(context.getReportId()) : "";


.....

if (context != null) {
  if (!blockReportLeaseManager.checkLease(node, startTime,
        context.getLeaseId())) {
    return false; <------- ** returns here without processing the FBR below **
  }
}

if (storageInfo.getBlockReportCount() == 0) {
  // The first block report can be processed a lot more efficiently than
  // ordinary block reports.  This shortens restart times.
  blockLog.info("BLOCK* processReport 0x{}: Processing first "
          + "storage report for {} from datanode {}",
      strBlockReportId,
      storageInfo.getStorageID(),
      nodeID.getDatanodeUuid());
  processFirstBlockReport(storageInfo, newReport);
} else {
  invalidatedBlocks = processReport(storageInfo, newReport, context);
}
{code}

> Block report leases cause missing blocks until next report
> ----------------------------------------------------------
>
>                 Key: HDFS-12914
>                 URL: https://issues.apache.org/jira/browse/HDFS-12914
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for conditions such
as "unknown datanode", "not in pending set", "lease has expired", wrong lease id, etc.  Lease
rejection does not throw an exception.  It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}}
and interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes active with
_no blocks_.  A replication storm ensues possibly causing DNs to temporarily go dead (HDFS-12645),
leading to more FBR lease rejections on re-registration.  The cluster will have many "missing
blocks" until the DNs next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message