hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ding Yuan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5931) Potential bugs and improvements for exception handlers
Date Tue, 11 Feb 2014 20:26:19 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ding Yuan updated HDFS-5931:
----------------------------

    Attachment: hdfs-5931.patch

> Potential bugs and improvements for exception handlers
> ------------------------------------------------------
>
>                 Key: HDFS-5931
>                 URL: https://issues.apache.org/jira/browse/HDFS-5931
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.2.0
>            Reporter: Ding Yuan
>         Attachments: hdfs-5931.patch
>
>
> Hi,
> This is to report some improvements and potential bug fixes to some error handling code.
Also attaching a patch for review.
> ==========================
> Case 1:
> org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
> In recoverUnclosedStreams, purgeLogsOlderThan, and endCurrentLogSegment,
> when not enough journals are found, mapJournalsAndReportErrors throws IOException.
> While in some other cases (such as in logEdit()), this case would be handled by a
> later call to logSync(), which will terminate, but in these three methods these exceptions
> might be swallowed since logSync() won't be called in anytime soon.
> Propose to immediately handle such cases as in logSync().
> {noformat}
>   synchronized void recoverUnclosedStreams() {
>     Preconditions.checkState(
>         state == State.BETWEEN_LOG_SEGMENTS,
>         "May not recover segments - wrong state: %s", state);
>      try {
>        journalSet.recoverUnfinalizedSegments();
>      } catch (IOException ex) {
> -      // All journals have failed, it is handled in logSync.
> -      // TODO: are we sure this is OK?
> +        //All journals have failed, handle it here.
> +        final String msg =
> +                "Could not find enough journals. "
> +            LOG.fatal(msg, new Exception());
> +            IOUtils.cleanup(LOG, journalSet);
> +            terminate(1, msg);
>      }
>    }
> {noformat}
> ==========================================
> ==========================
> Case 2:
> File: "org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java"
> {noformat}
> @@ -446,6 +446,8 @@
>            deleteDir(tmpDir);
>          } catch (IOException ex) {
>            LOG.error("Finalize upgrade for " + dataDirPath + " failed.", ex);
> +          return; // return so users will not see confusing log messages where
> +                  // "failed" is immediately followed by "complete"
>          }
>          LOG.info("Finalize upgrade for " + dataDirPath + " is complete.");
>        }
> {noformat}
>  In this case, the log can be confusing if deleteDir failed:
> "Finalize upgrade failed" message will be immediately followed by "Finalize upgrade is
complete".
> Same pattern can be seen at:
>   Line: 600, File: "org/apache/hadoop/hdfs/server/datanode/DataStorage.java"
>   Line: 1250, File: "org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java"
> ==========================================
> ==========================
> Case 3:
> File: "org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java"
> The log message doesn't include the exception cause.
> {noformat}
> @@ -167,9 +167,9 @@
>          LOG.debug(this + " received versionRequest response: " + nsInfo);
>          break;
>        } catch(SocketTimeoutException e) {  // namenode is busy
> -        LOG.warn("Problem connecting to server: " + nnAddr);
> +        LOG.warn("Problem connecting to server: " + nnAddr, e);
>        } catch(IOException e ) {  // namenode is not available
> -        LOG.warn("Problem connecting to server: " + nnAddr);
> +        LOG.warn("Problem connecting to server: " + nnAddr, e);
>        }
> {noformat}
> ==========================================
> ==========================
> Case 4:
> File: "org/apache/hadoop/hdfs/server/datanode/BlockSender.java"
> {noformat}
> @@ -371,6 +371,7 @@
>     IOException ioe = null;
>     if (blockInFd != null &&
>         ((dropCacheBehindAllReads) ||
>          (dropCacheBehindLargeReads && isLongRead()))) {
>       try {
>         NativeIO.POSIX.getCacheManipulator().posixFadviseIfPossible(
>             block.getBlockName(), blockInFd, lastCacheDropOffset,
>             offset - lastCacheDropOffset,
>             NativeIO.POSIX.POSIX_FADV_DONTNEED);
>       } catch (IOException e) {
>         LOG.warn("Unable to drop cache on file close", e);
> +       ioe = e;
>       }
>     }
>     if(checksumIn!=null) {
>       try {
>         checksumIn.close(); // close checksum file
>       } catch (IOException e) {
>         ioe = e;
>       }
>       checksumIn = null;
>  .. ..
>     if (ioe!=null) {
>       throw ioe;
>     }
> {noformat}
> It's unclear why exception from posixFadviceIfPossible is not returned to the caller.
> ==========================================
> ==========================
> Case 5:
> File: org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
> {noformat}
>     } finally {
>       dataXceiverServer.balanceThrottler.release();
>       if (isOpSuccess) {
> @@ -751,7 +751,8 @@
>          try {
>            // send one last byte to indicate that the resource is cleaned.
>            reply.writeChar('d');
> -        } catch (IOException ignored) {
> +        } catch (IOException e) {
> +          LOG.warn("Exception caught after copy block succeeded " + e);
>          }
>        }
>        IOUtils.closeStream(reply);
> {noformat}
> It is unclear why the IOException is not logged.
> Similar case:
>   Line: 876, File: "org/apache/hadoop/hdfs/server/datanode/DataXceiver.java"
> ==========================================



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message