hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Estes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14177) Full GC on client may lead to missing scan results
Date Wed, 23 Sep 2015 17:22:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904858#comment-14904858

James Estes commented on HBASE-14177:

Looks ok to me, but to be clear, the issue is only fixed if you have 1.1.0 client and server.
Any mix of other versions (even if you have 1.1.0 in the mix) does not fix the issue. This
was important for us to know because we were doing a rolling migration where we'd obviously
have a mix of versions during the rollout (it was to be an extended duration because we had
to upgrade from hadoop 2.2 to 2.6). However, I'm sure keeping that sort of matrix in jira
would be difficult. 

> Full GC on client may lead to missing scan results
> --------------------------------------------------
>                 Key: HBASE-14177
>                 URL: https://issues.apache.org/jira/browse/HBASE-14177
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.98.12, 0.98.13, 1.0.2
>            Reporter: James Estes
>            Priority: Critical
>              Labels: dataloss
>             Fix For: 1.0.3, 0.98.16
> After adding a large row, scanning back that row winds up being empty. After a few attempts
it will succeed (all attempts over the same data on an hbase getting no other writes).
> Looking at logs, it seems this happens when there is memory pressure on the client and
there are several Full GCs that happen. Then messages that indicate that region locations
are being removed from the local client cache:
> 2015-07-31 12:50:24,647 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
 - Removed as a location of big_row_1438368609944,,1438368610048.880c849594807bdc7412f4f982337d6c.
for tableName=big_row_1438368609944 from cache
> Blaming the GC may sound fanciful, but if the test is run with -Xms4g -Xmx4g then it
always passes on the first scan attempt. Maybe the pause is enough to remove something from
the cache, or the client is using weak references somewhere?
> More info http://mail-archives.apache.org/mod_mbox/hbase-user/201507.mbox/%3CCAE8tVdnFf%3Dob569%3DfJkpw1ndVWOVTkihYj9eo6qt0FrzihYHgw%40mail.gmail.com%3E
> Test used to reproduce:
> https://github.com/housejester/hbase-debugging#fullgctest
> I tested and had failures in:
> 0.98.12 client/server
> 0.98.13 client 0.98.12 server
> 0.98.13 client/server
> 1.1.0 client 0.98.13 server
> 0.98.13 client and 1.1.0 server
> 0.98.12 client and 1.1.0 server
> I tested without failure in:
> 1.1.0 client/server

This message was sent by Atlassian JIRA

View raw message