hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15436) BufferedMutatorImpl.flush() appears to get stuck
Date Thu, 17 Mar 2016 07:11:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198866#comment-15198866

Anoop Sam John commented on HBASE-15436:

There are some must fix things
1.  The BufferedMutator flush is keep on trying and taking more time.  It kicked as the size
of all Mutations accumulated so far, met the flush size. (Say 2 MB).  The flush takes time
and we keep on accepting new mutations into the list. This may lead to client side OOME !..
We may need to accept more mutations after a background started. Normally things will get
moving faster. But this cannot be infinite.  There should be a cap size for the size above
which we should block the writes. We should not take more than this limit. May be some thing
like 1.5 times of what is the flush size.
2. The row lookups into META happening for one row at a time. So this makes its such that
one row lookup failed after 36 retries and each having 1 min timeout.  The 1 min time out
itself is so high? And even after that it just fails this one Mutation and continue with remaining.
 What if we were doing multi Get to META table to know the region location for N mutations
at a time.
3. When close() is explicitly called on BufferedMutator, we try for graceful down (ie. wait
for a flush if one is there in progress and/or call flush before close).  In such case what
if the cluster is down and it takes too long. How long we should wait?  Whether we should
come out faster?  (May be loosing some Mutations, but that is any way known) (?)

> BufferedMutatorImpl.flush() appears to get stuck
> ------------------------------------------------
>                 Key: HBASE-15436
>                 URL: https://issues.apache.org/jira/browse/HBASE-15436
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.0.2
>            Reporter: Sangjin Lee
>         Attachments: hbaseException.log, threaddump.log
> We noticed an instance where the thread that was executing a flush ({{BufferedMutatorImpl.flush()}})
got stuck when the (local one-node) cluster shut down and was unable to get out of that stuck
> The setup is a single node HBase cluster, and apparently the cluster went away when the
client was executing flush. The flush eventually logged a failure after 30+ minutes of retrying.
That is understandable.
> What is unexpected is that thread is stuck in this state (i.e. in the {{flush()}} call).
I would have expected the {{flush()}} call to return after the complete failure.

This message was sent by Atlassian JIRA

View raw message