hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16074) ITBLL fails, reports lost big or tine families
Date Tue, 21 Jun 2016 18:49:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342438#comment-15342438

Elliott Clark commented on HBASE-16074:

So we had a run like this:

REFERENCED	0	1,800,000,000	1,800,000,000

That is the correct number of referenced but there shouldn't be any unreferenced. So we went
into the logs and found:

2016-06-21 04:28:43,314 WARN [main] org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify:
Prev is not set for: Y\xD3\x16t\xC5\x9D1@

That row key looks really weird. It's less than the length we would expect.

However it is the split point for a region:


Going into the shell and that row does not exist.

get ''HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.3.0-fb10-SNAPSHOT, rd8d63d67152af8eed48f8863a0e13d3e71fc097c, Fri Jun 10 16:59:00
PDT 2016

hbase(main):001:0> get 'IntegrationTestBigLinkedList.11', "Y\xD3\x16t\xC5\x9D1@"
COLUMN                                                                           CELL
0 row(s) in 0.3390 seconds

So that got us very worried about data loss. So we re-ran the verify step. When stopping the
chaos monkey and letting everything settle we got a clean verify step.

REFERENCED	0	1,800,000,000	1,800,000,000

> ITBLL fails, reports lost big or tine families
> ----------------------------------------------
>                 Key: HBASE-16074
>                 URL: https://issues.apache.org/jira/browse/HBASE-16074
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 1.3.0
>            Reporter: Mikhail Antonov
>            Assignee: Mikhail Antonov
>            Priority: Blocker
>             Fix For: 1.3.0
> Underlying MR jobs succeed but I'm seeing the following in the logs (mid-size distributed
test cluster):
> ERROR test.IntegrationTestBigLinkedList$Verify: Found nodes which lost big or tiny families,
> I do not know exactly yet whether it's a bug, a test issue or env setup issue, but need
figure it out. Opening this to raise awareness and see if someone saw that recently.

This message was sent by Atlassian JIRA

View raw message