hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1062) Compactions at (re)start on a large table can overwhelm DFS
Date Thu, 18 Dec 2008 03:14:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657660#action_12657660

Andrew Purtell commented on HBASE-1062:

> Is it wise postponing memcache flushes? 

I thought safe mode should be essentially "don't touch DFS". 

> We schedule compactions on open and on flush. This would put off the open scheduling
> for interval of 2 minutes. If cluster went down ugly, and some regions had References
> outstanding, then these regions would not be splittable

Wouldn't the references be cleared when the deferred compactions finally are allowed to run?
Then the split would happen. This is what I observe while testing. 

> Do we ever break out of this loop [...] Looks like we increment count then set it to
> after sleep. It never progresses?

The code in question just sleeps (once) during the CompactSplitThread main loop if count becomes
greater than limit, then count is reset.

It looks like I still need to be more aggressive with making the compact/split ramp-up a longer
slope, at least given our cluster and circumstances. The current patch helps but we can still
overwhelm DFS sometimes after a restart. 

> Compactions at (re)start on a large table can overwhelm DFS
> -----------------------------------------------------------
>                 Key: HBASE-1062
>                 URL: https://issues.apache.org/jira/browse/HBASE-1062
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>         Attachments: 1062-1.patch
> Given a large table, > 1000 regions for example, if a cluster restart is necessary,
the compactions undertaken by the regionservers when the master makes initial region assignments
can overwhelm DFS, leading to file errors and data loss. This condition is exacerbated if
write load was heavy before restart and so many regions want to split as soon as they are

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message