hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1955) HDFS-1826 made FSImage.doUpgrade() too fault-tolerant
Date Tue, 28 Jun 2011 23:13:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056878#comment-13056878
] 

Daryn Sharp commented on HDFS-1955:
-----------------------------------

+FSEditLog+
{code}
  public synchronized void errorOccurred(StorageDirectory sd)
      throws IOException {
    if (editStreams == null) {
{code}

Should this call {{getNumEditStreams()}}, or maybe even better yet {{isOpen()}} which returns
{{false}} if there are no edit streams?

+FSImage+
{code}
    storage.reportErrorsOnDirectories(errorSDs);
    if (!errorSDs.isEmpty()) {
{code}

Would it make sense to move the call to {{reportErrorOnDirectories}} inside the {{if}}?  Other
callers of the method tend to not unconditionally call the method.

This isn't strictly related to your change, but is a question/observation while tracing the
code.  I'm not a java threading expert, but is there a race condition here?

{code}
  private void waitForThreads(List<Thread> threads) {
    for (Thread thread : threads) {
      while (thread.isAlive()) {
        try {
          thread.join();
        } catch (InterruptedException iex) {
          LOG.error("Caught exception while waiting for thread " +
                    thread.getName() + " to finish. Retrying join");
        }        
      }
    }
  }
{code}

Can {{isAlive}} return {{false}} because the thread already terminated before {{waitForThreads}}
is invoked?  I ask because won't the thread be left in limbo?  In which case, should the {{while}}
be a {{do-while}}?

> HDFS-1826 made FSImage.doUpgrade() too fault-tolerant
> -----------------------------------------------------
>
>                 Key: HDFS-1955
>                 URL: https://issues.apache.org/jira/browse/HDFS-1955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>         Attachments: hdfs-1955_1.patch, hdfs-1955_1.patch
>
>
> Prior to HDFS-1826, doUpgrade() would fail if any of the storage directories failed to
successfully write the new fsimage or edits files.
> Now it appears to "succeed" even if some or all of the individual directories fail.
> There is some discussion about whether doUpgrade() should have some fault tolerance,
but for now make it fail on any single storage directory failure, as before.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message