hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy
Date Wed, 07 Sep 2011 23:01:10 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099637#comment-13099637
] 

Jean-Daniel Cryans commented on HBASE-4015:
-------------------------------------------

@Ted, thanks I didn't see it amid the rest.

@Ram, did you insert any data in those regions before killing the RSs? Replaying the edits
usually a good chunk of time for the region to be reopened. You could also try doing a worst
case cold startup by killing -9 all HBase components at the same time (more or less) and then
restarting them all (also after data was added). Finally you could try setting a super low
timeout setting, like 5 seconds, to trigger RIT timeouts by the hundreds.

> Refactor the TimeoutMonitor to make it less racy
> ------------------------------------------------
>
>                 Key: HBASE-4015
>                 URL: https://issues.apache.org/jira/browse/HBASE-4015
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 0.90.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-4015_1_trunk.patch, HBASE-4015_2_trunk.patch, HBASE-4015_reprepared_trunk_2.patch,
Timeoutmonitor with state diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition generator,
mostly making things worse rather than better. It does it's own thing for a while without
caring for what's happening in the rest of the master.
> The first thing that needs to happen is that the regions should not be processed in one
big batch, because that sometimes can take minutes to process (meanwhile a region that timed
out opening might have opened, then what happens is it will be reassigned by the TimeoutMonitor
generating the never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure how to do
it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message