hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Antonov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16232) ITBLL fails on branch-1.3, now loosing actual keys
Date Fri, 26 Aug 2016 23:23:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440174#comment-15440174
] 

Mikhail Antonov commented on HBASE-16232:
-----------------------------------------

Some update on this one and a quick summary. After spending lots of time chasing this w/ different
configuration I've tried here's my conclusion so far:

 - I can't reproduce it running on large distributed cluster with active Chaos Monkeys with
default "hbase.test.regions-per-server" (3). This is fairly rigorous test and a good baseline.
 - I've seen that very occasionally on the same cluster when I set num-regions-per-server
to be 100. At this point sporadic test failures start to creep in when I run the loop long
enough as occasionally some regions are getting stuck in transition and load generator or
verifier MR tasks fail with retries exhausted exception.
 - I see it more often if I up number of regions to be like 300 per server, and that makes
test iterations take longer and longer and aforementioned task crashes more often, making
reproduction really painful and unreliable.
 - With 100 or 300 regions per machine I have seen this on the tip of then-1.2.1 branch build.
That makes me think there might be something old lurking in existing code for a long time.
 - To the best of my knowledge, nobody else who ran ITBLL off 1.3 builds on real clusters
([~stack]?) was able to reproduce it, unlike the previous issues with fake keys, which was
reasonably reproducible on small distributed cluster (I haven't seen any other jiras files
on that or much activity here).

So based on that, I'm going to lower the priority for this task to Major and make it non-blocker
for release, while continuing looking into that in the background, because at this point I
don't know any reliable and repeatable way to reproduce it in reasonable amount of time on
reasonable cluster setup. If anybody has such repro on this, by all means please feel free
to step up and pick this one up.

> ITBLL fails on branch-1.3, now loosing actual keys
> --------------------------------------------------
>
>                 Key: HBASE-16232
>                 URL: https://issues.apache.org/jira/browse/HBASE-16232
>             Project: HBase
>          Issue Type: Bug
>          Components: dataloss, integration tests
>    Affects Versions: 1.3.0
>            Reporter: Mikhail Antonov
>            Assignee: Mikhail Antonov
>            Priority: Blocker
>             Fix For: 1.3.0
>
>
> So I'm running ITBLL off branch-1.3 on recent commit (after [~stack]'s fix for fake keys
showing up in the scans) with increased number of regions per regionserver and seeing the
following.
> {quote} 
> $Verify‚Äč$Counts	
> REFERENCED	0	4,999,999,994	4,999,999,994
> UNDEFINED	0	3	3
> UNREFERENCED	0	3	3
> {quote}
> So we're loosing some keys. This time those aren't fake:
> {quote}
> undef	
> \x89\x10\xE0\xBBx\xF1\xC4\xBAY`\xC4\xD77\x87\x84\x0F	0	1	1
> \x89\x11\x0F\xBA@\x0D8^\xAE \xB1\xCAh\xEB&\xE3	0	1	1
> \x89\x16waxv;\xB1\xE3Z\xE6"|\xFC\xBE\x9A	0	1	1
> unref	
> \x15\x1F*f\x92i6\x86\x1D\x8E\xB7\xE1\xC1=\x96\xEF	0	1	1
> \xF4G\xC6E\xD6\xF1\xAB\xB7\xDB\xC0\x94\xF2\xE7mN\xEC	0	1	1
> U\x0F'\x88\x106\x19\x1C\x87Y"\xF3\xE6\xC1\xC8\x15
> {quote}
> Re-running verify step with CM off still shows this issue. Search tool reports:
> {quote}
> Total
> \x89\x11\x0F\xBA@\x0D8^\xAE \xB1\xCAh\xEB&\xE3	5	0	5
> \x89\x16waxv;\xB1\xE3Z\xE6"|\xFC\xBE\x9A	4	0	4
> CELL_WITH_MISSING_ROW	15	0	15
> {quote}
> Will post more as I dig into.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message