hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14028) DistributedLogReplay drops edits when ITBLL 125M
Date Wed, 08 Jul 2015 05:17:06 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617997#comment-14617997
] 

stack commented on HBASE-14028:
-------------------------------

I have been playing more with this. Losing data is pretty easy to do. Trying to find why the
end of a WAL goes missing during replay; there is not enough info to debug and it is a little
tough to trace where we're at at any one time. Trying to back fill.

> DistributedLogReplay drops edits when ITBLL 125M
> ------------------------------------------------
>
>                 Key: HBASE-14028
>                 URL: https://issues.apache.org/jira/browse/HBASE-14028
>             Project: HBase
>          Issue Type: Bug
>          Components: Recovery
>    Affects Versions: 1.2.0
>            Reporter: stack
>
> Testing DLR before 1.2.0RC gets cut, we are dropping edits.
> Issue seems to be around replay into a deployed region that is on a server that dies
before all edits have finished replaying. Logging is sparse on sequenceid accounting so can't
tell for sure how it is happening (and if our now accounting by Store is messing up DLR).
Digging.
> I notice also that DLR does not refresh its cache of region location on error -- it just
keeps trying till whole WAL fails.... 8 retries...about 30 seconds. We could do a bit of refactor
and have the replay find region in new location if moved during DLR replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message