hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Antonov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16960) RegionServer hang when aborting
Date Thu, 03 Nov 2016 10:22:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632308#comment-15632308
] 

Mikhail Antonov commented on HBASE-16960:
-----------------------------------------

Good job!

Skimmed the patch, looks good to me, but want to get back and dig more to it this week to
see if there are similar possible issues around it.  appends failing due to socket timeout
on DN are to be expected, I'd say, but I don't thing I've seen this... How bad is this for
you? How frequently you see that [~carp84] and [~aoxiang]?

"Actually binlijin and I also observed more questions on whether the current implementation
could assure the semantic that "failed appends won't get synced successfully", and we're still
digging into it. Will open another JIRA if any solution."

Any follow-ups on that? It seems like there are few other changes to the WALs either done,
or in flight, but they seem too big to get to 1.3.0 and need to be carefully stress tested.
Thinking to move it to 1.3.1, where I'd bring those changed and bake in. Thoughts (that depends
on how bad this issue is) ?


> RegionServer hang when aborting
> -------------------------------
>
>                 Key: HBASE-16960
>                 URL: https://issues.apache.org/jira/browse/HBASE-16960
>             Project: HBase
>          Issue Type: Bug
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: 16960.ut.missing.final.piece.txt, HBASE-16960.branch-1.v1.patch,
HBASE-16960.patch, HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch, HBASE-16960_master_v4.patch,
RingBufferEventHandler.png, RingBufferEventHandler_exception.png, SyncFuture.png, SyncFuture_exception.png,
rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on this regionserver
out of service and then all affected applications stop works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message