hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder
Date Mon, 18 Jun 2012 12:37:44 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395862#comment-13395862
] 

Vinay commented on HDFS-3541:
-----------------------------

By seeing the ThreadDump attached, recoverBlock(..) call is waiting to join the writer thread
in ReplicaInPipeline#stopWriter().

{code}  public void stopWriter() throws IOException {
    if (writer != null && writer != Thread.currentThread() && writer.isAlive())
{
      writer.interrupt();
      try {
        writer.join();
      } catch (InterruptedException e) {
        throw new IOException("Waiting for writer thread is interrupted.");
      }
    }
  }{code}

FSDataSetImpl#initReplicaRecovery will call the above Method, but it have already locked the
FSDataSet.

In the current thread dump, writer thread is one of the DataXceiver threads, which are waiting
on their respective PacketResponder threads. 

# Here *writer.interrupt()* will succeed in interrupting the thread only in case if the it
is in waiting/sleeping state. otherwise it will not actually intterrupt it. So it will wait
till the thread completes its execution.
# writer thread is DataXceiver thread, which is waiting to join PacketResponder Thread.
# Packet Responders are waiting on *fsdataset* lock to finalize the block.

So its a deadlock.

Here ReplicaInPipeline#stopWriter() should ensure that thread is interrupted successfully.

following changes should work in this case
{code}  public void stopWriter() throws IOException {
    if (writer != null && writer != Thread.currentThread()) {
      while (writer.isAlive()) {
        writer.interrupt();
        try {
          writer.wait(100);
        } catch (InterruptedException e) {
          throw new IOException("Waiting for writer thread is interrupted.");
        }
      }
    }
  }{code}
                
> Deadlock between recovery, xceiver and packet responder
> -------------------------------------------------------
>
>                 Key: HDFS-3541
>                 URL: https://issues.apache.org/jira/browse/HDFS-3541
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 2.0.1-alpha
>            Reporter: suja s
>            Assignee: Vinay
>         Attachments: DN_dump.rar
>
>
> Block Recovery initiated while write in progress at Datanode side. Found a lock between
recovery, xceiver and packet responder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message