hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
Date Thu, 01 Oct 2015 06:59:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939402#comment-14939402
] 

Rakesh R commented on HDFS-9185:
--------------------------------

Following is my analysis:

# ErasureCodingWorker is creating the {{RemoteBlockReader2}} with null {{tracer}}, during
the {{RemoteBlockReader2#read}} function call, it is hitting NPE and resulting in the failure.
To fix this, how about passing the {{datanode#getTracer()}} to the reader ?
{code}
ErasureCodingWorker .java

        return RemoteBlockReader2.newBlockReader(
            "dummy", block, blockToken, offsetInBlock, 
            block.getNumBytes() - offsetInBlock, true,
            "", newConnectedPeer(block, dnAddr, blockToken, dnInfo), dnInfo,
            null, cachingStrategy, null);
{code}
{code}
RemoteBlockReader2.java

  public synchronized int read(ByteBuffer buf) throws IOException {
    if (curDataSlice == null || curDataSlice.remaining() == 0 && bytesNeededToFinish
> 0) {
      TraceScope scope = tracer.newScope(
          "RemoteBlockReader2#readNextPacket(" + blockId + ")");
      try {
        readNextPacket();
      } finally {
        scope.close();
      }
    }
{code}
# The root cause is not visible in the log messages as StripedBlockUtil#getNextCompletedStripedRead()
is logging the exception with {{DEBUG}} level, IMHO the log level has to be changed to {{INFO}}
 to know the failure reason.
{code}
if (DFSClient.LOG.isDebugEnabled()) {
        DFSClient.LOG.debug("ExecutionException " + e);
      }
{code}

I'll soon prepare a patch including these changes.

> TestRecoverStripedFile is failing
> ---------------------------------
>
>                 Key: HDFS-9185
>                 URL: https://issues.apache.org/jira/browse/HDFS-9185
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Critical
>
> Below is the message taken from build:
> {code}
> Error Message
> Time out waiting for EC block recovery.
> Stacktrace
> java.io.IOException: Time out waiting for EC block recovery.
> 	at org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383)
> 	at org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283)
> 	at org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168)
> {code}
> Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message