hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sam rash (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
Date Tue, 22 Jun 2010 00:57:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880989#action_12880989
] 

sam rash commented on HDFS-1218:
--------------------------------

I realize in the hadoop code we already swallow InterruptedException frequently, but I think
you can change the trend here:

{code}
        // wait for all acks to be received back from datanodes
        synchronized (ackQueue) {
          if (!closed && ackQueue.size() != 0) {
            try {
              ackQueue.wait();
            } catch (InterruptedException e) {
              Thread.currentThread.interrupt();  //add this 
            }
            continue;
          }
        }
{code}

otherwise, it's very easy to have a thread that I own and manage that has a DFSOutputStream
in it that swallows an interrupt.  when i check Thread.currentThread.isInterrupted() to see
if one of my other threads has interrupted me, i will not see it

(the crux here is that swallowing interrupts in threads that hadoop controls are less harmful--this
is directly in client code when you call sync()/close())


> 20 append: Blocks recovered on startup should be treated with lower priority during block
synchronization
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1218
>                 URL: https://issues.apache.org/jira/browse/HDFS-1218
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20-append
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.20-append
>
>         Attachments: hdfs-1281.txt
>
>
> When a datanode experiences power loss, it can come back up with truncated replicas (due
to local FS journal replay). Those replicas should not be allowed to truncate the block during
block synchronization if there are other replicas from DNs that have _not_ restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message