hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15837) More gracefully handle a negative memstoreSize
Date Tue, 17 May 2016 03:40:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285967#comment-15285967

Enis Soztutar commented on HBASE-15837:

[~elserj] I was also looking at this to understand what may have happened. The memstore size
discrepancy happens when the index update fails for Phoenix. I also have a patch that should
fix the root cause. 

It is something like this:
 - HRegion.memstoreSize gets updated only after a batch is finished. HStore.getMemstoreSize()
gets updated everytime we add a new cell. If the transaction rolls back from the memstore,
we also  decrement the size in HStore. 
 - HRegion.batchMutate() has the code: 
        long addedSize = doMiniBatchMutation(batchOp);
        long newSize = this.addAndGetGlobalMemstoreSize(addedSize);
Which means that the HRegion.memstoreSize will get update ONLY when doMiniBatchMutation DOES
NOT throw an exception. 
 - In most cases when doMiniBatchMutation() throws an exception, we sometimes undo the memstore
operations by rolling back. This ensures that updates are removed and hstore size is decremented
 - However, in the case that postBatchMutate() throws exception, the WAL is already sync()'ed,
so we cannot rollback the transaction. We actually have the edits in the memstore and WAL,
but fail to update the region memstore size. HStore size is correctly updated, thus leaving
the discrepancy.
 - Phoenix secondary indexing does the scans in the preBatchMutate() call and does the secondary
index writes in the postBatchMutate() call. Thus, if the secondary index writes fail, we end
up with the accounting error, and subsequent aborting of regionservers in region close. 

> More gracefully handle a negative memstoreSize
> ----------------------------------------------
>                 Key: HBASE-15837
>                 URL: https://issues.apache.org/jira/browse/HBASE-15837
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 2.0.0
>         Attachments: HBASE-15837.001.patch
> Over in PHOENIX-2883, I've been trying to figure out how to track down the root cause
of an issue we were seeing where a negative memstoreSize was ultimately causing an RS to abort.
The tl;dr version is
> * Something causes memstoreSize to be negative (not sure what is doing this yet)
> * All subsequent flushes short-circuit and don't run because they think there is no data
to flush
> * The region is eventually closed (commonly, for a move).
> * A final flush is attempted on each store before closing (which also short-circuit for
the same reason), leaving unflushed data in each store.
> * The sanity check that each store's size is zero fails and the RS aborts.
> I have a little patch which I think should improve our failure case around this, preventing
the RS abort safely (forcing a flush when memstoreSize is negative) and logging a calltrace
when an update to memstoreSize make it negative (to find culprits in the future).

This message was sent by Atlassian JIRA

View raw message