hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16960) RegionServer hang when aborting
Date Tue, 01 Nov 2016 03:46:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624243#comment-15624243

stack commented on HBASE-16960:

This makes sense:

672	      // SyncFuture reuse by thread, if TimeoutIOException happens, ringbuffer
673	      // still refer to it, so if this thread use it next time may get a wrong
674	      // result.

... Must have taken a while to figure.

Patch looks good to me [~aoxiang] Does the test reproduce the scenario you've run into? And
when you do reproduce the lockup, does the freeing of SyncFutures unblock us?

I'll attach a patch I've been working on.  I am missing a final ingrediient because it is
not locking up yet. I was going to work on it this evening but if your patch does the job,
I'll give up on it.

Thanks [~aoxiang] for fixing this stuff.

> RegionServer hang when aborting
> -------------------------------
>                 Key: HBASE-16960
>                 URL: https://issues.apache.org/jira/browse/HBASE-16960
>             Project: HBase
>          Issue Type: Bug
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16960.patch, HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch,
RingBufferEventHandler.png, RingBufferEventHandler_exception.png, SyncFuture.png, SyncFuture_exception.png,
> We see regionserver hang when aborting several times and cause all regions on this regionserver
out of service and then all affected applications stop works.

This message was sent by Atlassian JIRA

View raw message