hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
Date Thu, 07 Aug 2014 22:07:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089923#comment-14089923
] 

Lars Hofhansl commented on HBASE-11695:
---------------------------------------

Can't try on the production site easily right now.
This was a dud anyway, in the sense that it does not cause the issue. It's just a weirdness
observed. The problem was that flushes took a very long time (hours); not sure why, yet, but
probably due to a networking issue. Hence all flushes were waiting and after one hour all
the waiting regions become eligible for the periodic flusher.

The problem here is only a cosmetic problem then. Because the wake waittime is less than the
jitter in most of the cases we'll see each region requesting a flush twice.

> PeriodicFlusher and WakeFrequency issues
> ----------------------------------------
>
>                 Key: HBASE-11695
>                 URL: https://issues.apache.org/jira/browse/HBASE-11695
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.21
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Critical
>             Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>
>         Attachments: 11695-trunk.txt
>
>
> We just ran into a flush storm caused by the PeriodicFlusher.
> Many memstore became eligible for flushing at exactly the same time, the effect we've
seen is that the exact same region was flushed multiple times, because the flusher wakes up
too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually
flush the memstore.
> Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing
"important" flushes from happening.
> {code}
> 06-Aug-2014 20:11:56  [regionserver60020.periodicFlusher] INFO  org.apache.hadoop.hbase.regionserver.HRegionServer[1397]--
regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2.
after a delay of 13449
> 06-Aug-2014 20:12:06  [regionserver60020.periodicFlusher] INFO  org.apache.hadoop.hbase.regionserver.HRegionServer[1397]--
regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2.
after a delay of 14060
> {code}
> So we need to increase the period of the PeriodicFlusher to at least the random jitter,
also increase the default random jitter (20s does not help with many regions).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message