hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11974) Fsimage transfer failed due to socket timeout, but logs doesn't show that
Date Mon, 19 Jun 2017 03:27:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053440#comment-16053440
] 

Yongjun Zhang commented on HDFS-11974:
--------------------------------------

A further thought.

The exception reported was thrown here
{code}
      if (finishedReceiving && received != advertisedSize) {
        // only throw this exception if we think we read all of it on our end
        // -- otherwise a client-side IOException would be masked by this
        // exception that makes it look like a server-side problem!
        deleteTmpFiles(localPaths);
        throw new IOException("File " + url + " received length " + received +
            " is not of the advertised size " + advertisedSize +
            ". Fsimage name: " + fsImageName + " lastReceived: " + num);
      }
{code}
where {{finishedReceiving}} is true. It's only true when the loop finishes
{code}
    byte[] buf = new byte[IO_FILE_BUFFER_SIZE];
      while (num > 0) {
        num = stream.read(buf);
        if (num > 0) {
          received += num;
          for (FileOutputStream fos : outputStreams) {
            fos.write(buf, 0, num);
          }
          if (throttler != null) {
            throttler.throttle(num);
          }
        }
      }
      finishedReceiving = true;
{code}

It's puzzling: if there is socket time out exception, it should be thrown in the above loop,
and {{finishedReceiving}} should not have been set to true. If {{finishedReceiving}}  is set
to true, then no exception is expected to have been thrown in the above loop presumably.






> Fsimage transfer failed due to socket timeout, but logs doesn't show that
> -------------------------------------------------------------------------
>
>                 Key: HDFS-11974
>                 URL: https://issues.apache.org/jira/browse/HDFS-11974
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> The idea of HDFS-11914 is to add more diagnosis information to understand what happened
when we saw
> {code}
> WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs
(auth:SIMPLE) cause:java.io.IOException: File http://x.y.z:50070/imagetransfer?getimage=1&txid=latest
received length xyz is not of the advertised size abc.
> {code}
> After further study, I realize that the above exception is thrown in the {{finally}}
block of {{TransferFsImage#receiveFile}} method, thus other exception thrown in the main code
is not reported, such as SocketTimeOut.
> We should include the information of the exceptions thrown in the main code when throwing
exception in the {{finally}} block.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message