hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-1754) indefinite hang in IPC under some circumstances
Date Sat, 08 Aug 2009 18:19:14 GMT
indefinite hang in IPC under some circumstances
-----------------------------------------------

                 Key: HBASE-1754
                 URL: https://issues.apache.org/jira/browse/HBASE-1754
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: Andrew Purtell


If a regionserver crashes while the client is engaged in IPC with it at a vulnerable point
in the TCP FSM (ESTABLISHED, no outstanding data to send), the IPC will be stuck waiting forever
until the regionserver is restarted and at the TCP level the connection will be reset. However,
it is not possible to restart the regionserver if the client is colocated with it on the same
host, because the OS will consider port 60020 bound and in use, unless the client is forcibly
killed. Killing some types of applications -- especially long running processes which can't
redo work from a checkpoint but must start over from the beginning -- can be very painful.
Investigate if TCP keepalives can be enabled at the IPC level. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message