hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gang Xie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12821) Block invalid IOException causes the DFSClient domain socket being disabled
Date Thu, 16 Nov 2017 03:55:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254712#comment-16254712
] 

Gang Xie commented on HDFS-12821:
---------------------------------

yes, should be the same issue. 

> Block invalid IOException causes the DFSClient domain socket being disabled
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-12821
>                 URL: https://issues.apache.org/jira/browse/HDFS-12821
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.4.0, 2.6.0
>            Reporter: Gang Xie
>
> We use HDFS2.4 & 2.6, and recently hit a issue that DFSClient domain socket is disabled
when datanode throw block invalid exception. 
> The block is invalidated for some reason on datanote and it's OK. Then DFSClient tries
to access this block on this datanode via domain socket. This triggers a IOExcetion. On DFSClient
side, when get a IOExcetion and error code 'ERROR', it disables the domain socket and fails
back to TCP. and the worst is that it seems never recover the socket. 
> I think this is a defect and with such "block invalid" exception, we should not disable
the domain socket because the is nothing wrong about the domain socket service.
> And thoughts?
> The code:
> private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
>         Slot slot) throws IOException {
>   ShortCircuitCache cache = clientContext.getShortCircuitCache();
>   final DataOutputStream out =
>       new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
>   SlotId slotId = slot == null ? null : slot.getSlotId();
>   new Sender(out).requestShortCircuitFds(block, token, slotId, 1);
>   DataInputStream in = new DataInputStream(peer.getInputStream());
>   BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
>       PBHelper.vintPrefixed(in));
>   DomainSocket sock = peer.getDomainSocket();
>   switch (resp.getStatus()) {
>   case SUCCESS:
>     byte buf[] = new byte[1];
>     FileInputStream fis[] = new FileInputStream[2];
>     sock.recvFileInputStreams(fis, buf, 0, buf.length);
>     ShortCircuitReplica replica = null;
>     try {
>       ExtendedBlockId key =
>           new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId());
>       replica = new ShortCircuitReplica(key, fis[0], fis[1], cache,
>           Time.monotonicNow(), slot);
>     } catch (IOException e) {
>       // This indicates an error reading from disk, or a format error.  Since
>       // it's not a socket communication problem, we return null rather than
>       // throwing an exception.
>       LOG.warn(this + ": error creating ShortCircuitReplica.", e);
>       return null;
>     } finally {
>       if (replica == null) {
>         IOUtils.cleanup(DFSClient.LOG, fis[0], fis[1]);
>       }
>     }
>     return new ShortCircuitReplicaInfo(replica);
>   case ERROR_UNSUPPORTED:
>     if (!resp.hasShortCircuitAccessVersion()) {
>       LOG.warn("short-circuit read access is disabled for " +
>           "DataNode " + datanode + ".  reason: " + resp.getMessage());
>       clientContext.getDomainSocketFactory()
>           .disableShortCircuitForPath(pathInfo.getPath());
>     } else {
>       LOG.warn("short-circuit read access for the file " +
>           fileName + " is disabled for DataNode " + datanode +
>           ".  reason: " + resp.getMessage());
>     }
>     return null;
>   case ERROR_ACCESS_TOKEN:
>     String msg = "access control error while " +
>         "attempting to set up short-circuit access to " +
>         fileName + resp.getMessage();
>     if (LOG.isDebugEnabled()) {
>       LOG.debug(this + ":" + msg);
>     }
>     return new ShortCircuitReplicaInfo(new InvalidToken(msg));
>   default:
>     LOG.warn(this + ": unknown response code " + resp.getStatus() +
>         " while attempting to set up short-circuit access. " +
>         resp.getMessage());
> {color:#d04437}    clientContext.getDomainSocketFactory()
>         .disableShortCircuitForPath(pathInfo.getPath());{color}
>     return null;
>   }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message