hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1049) race condition in setting up ipc connections
Date Wed, 28 Feb 2007 17:32:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476650

Devaraj Das commented on HADOOP-1049:

I just looked at the code to find out possible race conditions. I saw that one possible case
is when there is an error in connecting to a server. In such a case, the values of the various
fields are:
socket = some valid value, inUse = 0, shouldCloseConnection = false, in = null
At this point of time, the connection-thread is waiting on a wait() method (inside waitForWork)
Now, assuming that the ConnectionCuller has not killed the connection (removed the connection
from the cache), if another attempt is made to connect to the same server, the ref count is
incremented on the connection object. The call to setupIOstreams will notify the connection-thread
that there is work to be done and return immediately (as the socket is non-null). The connection-thread
wakes up and finds the values:
socket = some valid value, inUse = 1, shouldCloseConnection = false, in = null
So waitForWork returns "true". This causes the next statement in the connection-thread's run
method to execute which is "in.readInt" and since "in" is null we get a NullPointerException.

When the patch to HADOOP-312 was committed, the socket.connect call was not there and instead
the socket would always be null if the connection could not be established to the server in
question. In some patch, this behaviour got changed (included timeout) to 
socket = new Socket();
socket.connect(address, timeout); 
So, irrespective of whether we could connect to the server, socket would always have a valid
non-null value. Unfortunately, this impacts the logic of the IPC client system. 

A fix for this would be to set socket to null if we could not connect to the server after
maxRetries number of retrials (today just inUse is set to zero if this condition becomes true).

> race condition in setting up ipc connections
> --------------------------------------------
>                 Key: HADOOP-1049
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1049
>             Project: Hadoop
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.11.2
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
> While running svn head, I get:
> [junit] 2007-02-27 19:11:17,707 INFO  ipc.Client (Client.java:run(281)) - java.lang.NullPointerException
>     [junit] 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:251)
> There is a race condition between when the threads are created above and when the IO
streams are set up below.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message