hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2789) Race condition in ipc.Server prevents responce being written back to client.
Date Thu, 07 Feb 2008 00:51:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566420#action_12566420

Raghu Angadi commented on HADOOP-2789:

One possible issue is that Responder cancels the key since there are no more responses, but
the an IPC handler tries to register the key, before next select() is called, resulting in
CancelledKeyException. The JavaDoc for Selector is not very explicit but does seem to imply
it will remove keys from cancelled_set only inside a select(). And I saw the following exception
once for the test (without proposed fix patch):
2008-02-07 00:17:17,276 INFO  ipc.Server (Server.java:run(937)) - IPC Server handler 1 on
42215 caught: java.nio.channels.CancelledKeyExce
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64)
        at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:175)
        at java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
        at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:630)
        at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:666)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:930)

> Race condition in ipc.Server prevents responce being written back to client.
> ----------------------------------------------------------------------------
>                 Key: HADOOP-2789
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2789
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.16.0
>            Reporter: Clint Morgan
>            Priority: Critical
>         Attachments: HADOOP-2789.patch
> I encountered a race condition in ipc.Server when writing the response
> back to the socket. Sometimes the write SelectKey is being canceled
> when it should not be, and thus the full response never gets
> written. This results in clients timing out on the socket while waiting for the response.
> I am attaching a unit test that demonstrates the problem. It follows
> closely the TestIPC test, however the socket output buffer is set
> smaller than the result being sent back, so that partial writes
> occur. I also put random sleep in the client to help provoke the race
> condition.
> On my machine this fails over half of the time.
> Looking at the code in ipc.Server.java. The problem is manifested in
> Responder.doAsyncWrite(). If I comment out the key.cancel() line, then
> everything works fine. 
> So we need to identify when to safely cancel the key.
> I tried the following:
> {noformat}
>     private void doAsyncWrite(SelectionKey key) throws IOException {
>       Call call = (Call)key.attachment();
>       if (call == null) {
>         return;
>       }
>       if (key.channel() != call.connection.channel) {
>         throw new IOException("doAsyncWrite: bad channel");
>       }
>       if (processResponse(call.connection.responseQueue)) {
>           synchronized(call.connection.responseQueue) {
>               if (call.connection.responseQueue.size() == 0) {
>                   LOG.info("Cancelling key for call "+call.toString()+ " key: "+ key.toString());
>                   key.cancel();          // remove item from selector.
>               } else {
>                   LOG.warn("NOT REALLY DONE: "+call.toString()+ " key: "+ key.toString());
>               }
>           }
>       }
>     }
> {noformat}
> And this does catch some of the cases (EG, the LOG.warn message gets hit), but i still
hit the race condition.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message