hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3673) Deadlock in Datanode RPC servers
Date Tue, 01 Jul 2008 20:28:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609699#action_12609699

dhruba borthakur commented on HADOOP-3673:

I think it is good from an architecture point of view to keep the client relatively lightweight
as much as possible. This will allow us to port the client to many other languages. So, i
would like to persist with the current design (of making the primary datanode invoke recoverBlock).

What if the RPC.Server can me made to support dynamic handler threads? For example, when the
Datanode creates a RPC Server, it can specify a handler count of 0. The RPC Server code will
interpret a count of 0 to mean that the application wants to create a new thread to service
each and every RPC request. In this case, the RPC Server code will internally create a single
handler thread for this Server. This special handler thread will invoke callQueue.take() to
retrieve a new incoming call, and then fork off a new thread to process this call. Care has
to be taken to ensure that responses from calls from the same connection are sequentialized
and processed in order.

> Deadlock in Datanode RPC servers
> --------------------------------
>                 Key: HADOOP-3673
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3673
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: dhruba borthakur
>            Priority: Blocker
> There is a deadlock scenario in the way Lease Recovery is triggered using the Datanode
RPC server via HADOOP-3283.
> Each Datanode has dfs.datanode.handler.count handler threads (default of 3). These handler
threads are used to support the generation-stamp-dance protocol as described in HADOOP-1700.
> Let me try to explain the scenario with an example. Suppose, a cluster has two datanodes.
Also, let's assume that dfs.datanode.handler.count is set to 1. Suppose that there are two
clients, each writing to a separate file with a replication factor of 2. Let's assume that
both clients encounter an IO error and triggers the generation-stamp-dance protocol. The first
client may invoke recoverBlock on the first datanode while the second client may invoke recoverBlock
on the second datanode. Now, each of the datanode will try to make a getBlockMetaDataInfo()
to the other datanode. But since each datanode has only 1 server handler threads, both threads
will block for eternity. Deadlock!

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message