hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-1754) use TCP keepalives
Date Tue, 18 Aug 2009 02:09:14 GMT

     [ https://issues.apache.org/jira/browse/HBASE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Purtell updated HBASE-1754:

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to branch and trunk.

> use TCP keepalives
> ------------------
>                 Key: HBASE-1754
>                 URL: https://issues.apache.org/jira/browse/HBASE-1754
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.20.0, 0.21.0
>         Attachments: HBASE-1754.patch
> If a regionserver crashes while the client is engaged in IPC with it at a vulnerable
point in the TCP FSM (ESTABLISHED, no outstanding data to send), the IPC will be stuck waiting
"forever" (> 12 hours, etc.). This hoses the client, especially if it is trying to look
up a region in META. Worse, it is not possible to restart the regionserver if the hung client
is colocated with it on the same host, because the OS will consider port 60020 bound and in
use, unless the client is forcibly killed. Killing some types of applications -- especially
long running processes which can't redo work from a checkpoint but must start over from the
beginning -- can be very painful. Investigate if TCP keepalives can be enabled at the IPC

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message