hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1955) HDFS-1826 made FSImage.doUpgrade() too fault-tolerant
Date Tue, 28 Jun 2011 23:13:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056878#comment-13056878

Daryn Sharp commented on HDFS-1955:

  public synchronized void errorOccurred(StorageDirectory sd)
      throws IOException {
    if (editStreams == null) {

Should this call {{getNumEditStreams()}}, or maybe even better yet {{isOpen()}} which returns
{{false}} if there are no edit streams?

    if (!errorSDs.isEmpty()) {

Would it make sense to move the call to {{reportErrorOnDirectories}} inside the {{if}}?  Other
callers of the method tend to not unconditionally call the method.

This isn't strictly related to your change, but is a question/observation while tracing the
code.  I'm not a java threading expert, but is there a race condition here?

  private void waitForThreads(List<Thread> threads) {
    for (Thread thread : threads) {
      while (thread.isAlive()) {
        try {
        } catch (InterruptedException iex) {
          LOG.error("Caught exception while waiting for thread " +
                    thread.getName() + " to finish. Retrying join");

Can {{isAlive}} return {{false}} because the thread already terminated before {{waitForThreads}}
is invoked?  I ask because won't the thread be left in limbo?  In which case, should the {{while}}
be a {{do-while}}?

> HDFS-1826 made FSImage.doUpgrade() too fault-tolerant
> -----------------------------------------------------
>                 Key: HDFS-1955
>                 URL: https://issues.apache.org/jira/browse/HDFS-1955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>         Attachments: hdfs-1955_1.patch, hdfs-1955_1.patch
> Prior to HDFS-1826, doUpgrade() would fail if any of the storage directories failed to
successfully write the new fsimage or edits files.
> Now it appears to "succeed" even if some or all of the individual directories fail.
> There is some discussion about whether doUpgrade() should have some fault tolerance,
but for now make it fail on any single storage directory failure, as before.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message