hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang Xie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6286) adding a timeout setting for local read io
Date Mon, 26 May 2014 06:57:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008621#comment-14008621

Liang Xie commented on HDFS-6286:

bq. Yes, hedged reads only work for pread() now. We ought to extend it to all forms of read().
This will be a big latency win across the board, and not only for local reads.
Seems we have not a special issue be filed against it, right ? 

Just a minor update, i wrote some codes with my previous proposed manner and did a simple
testing, it shows works.  Most of changes are in BlockReaderLocal.read().  Replaced "dataIn.read(buf,off,len)
      Callable<Integer> readCallable = new Callable<Integer>() {
        public Integer call() throws Exception {
          return dataIn.read(buf, off, len);
      Future<Integer> future = null;
      try {
        future = localReadPool.submit(readCallable);
      } catch (RejectedExecutionException e) {
        //It's not a good idea to catch a runtime exception in usual, emmm
        LOG.warn("", e);
        throw new IOException(e);
      long timeout = localReadTimeoutMs > 0 ? localReadTimeoutMs : 10000L;
      try {
        return future.get(timeout, TimeUnit.MILLISECONDS).intValue();
      } catch (InterruptedException e) {
        // probably a close() request comes now?
        LOG.warn("", e);
        throw new IOException(e);
      } catch (ExecutionException e) {
        //the real read i/o error ?
        LOG.warn("", e);
        throw new IOException(e);
      } catch (TimeoutException e) {
        LOG.warn("read timeout:" + timeout + "ms, gc issue? bad disk?", e);
        throw new IOException(e);

> adding a timeout setting for local read io
> ------------------------------------------
>                 Key: HDFS-6286
>                 URL: https://issues.apache.org/jira/browse/HDFS-6286
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
> Currently, if a write or remote read requested into a sick disk, DFSClient.hdfsTimeout
could help the caller have a guaranteed time cost to return back. but it doesn't work on local
read. Take HBase scan for example,
> DFSInputStream.read -> readWithStrategy -> readBuffer -> BlockReaderLocal.read
->  dataIn.read -> FileChannelImpl.read
> if it hits a bad disk, the low read io probably takes tens of seconds,  and what's worse
is, the "DFSInputStream.read" hold a lock always.
> Per my knowledge, there's no good mechanism to cancel a running read io(Please correct
me if it's wrong), so my opinion is adding a future around the read request, and we could
set a timeout there, if the threshold reached, we can add the local node into deadnode probably...
> Any thought?

This message was sent by Atlassian JIRA

View raw message