hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16074) ITBLL fails, reports lost big or tine families
Date Tue, 21 Jun 2016 18:49:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342438#comment-15342438
] 

Elliott Clark commented on HBASE-16074:
---------------------------------------

So we had a run like this:

{code}
REFERENCED	0	1,800,000,000	1,800,000,000
UNREFERENCED	0	76	76
{code}


That is the correct number of referenced but there shouldn't be any unreferenced. So we went
into the logs and found:

{code}
2016-06-21 04:28:43,314 WARN [main] org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify:
Prev is not set for: Y\xD3\x16t\xC5\x9D1@
{code}

That row key looks really weird. It's less than the length we would expect.

However it is the split point for a region:
{code}
IntegrationTestBigLinkedList.11,Y\xD3\x16t\xC5\x9D1@,1466506812220.15898a252e1b54728dd44a2b13fca290.

{code}

Going into the shell and that row does not exist.

{code}
get ''HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.3.0-fb10-SNAPSHOT, rd8d63d67152af8eed48f8863a0e13d3e71fc097c, Fri Jun 10 16:59:00
PDT 2016

hbase(main):001:0> get 'IntegrationTestBigLinkedList.11', "Y\xD3\x16t\xC5\x9D1@"
COLUMN                                                                           CELL
0 row(s) in 0.3390 seconds
{code}

So that got us very worried about data loss. So we re-ran the verify step. When stopping the
chaos monkey and letting everything settle we got a clean verify step.

{code}
REFERENCED	0	1,800,000,000	1,800,000,000
{code}


> ITBLL fails, reports lost big or tine families
> ----------------------------------------------
>
>                 Key: HBASE-16074
>                 URL: https://issues.apache.org/jira/browse/HBASE-16074
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 1.3.0
>            Reporter: Mikhail Antonov
>            Assignee: Mikhail Antonov
>            Priority: Blocker
>             Fix For: 1.3.0
>
>
> Underlying MR jobs succeed but I'm seeing the following in the logs (mid-size distributed
test cluster):
> ERROR test.IntegrationTestBigLinkedList$Verify: Found nodes which lost big or tiny families,
count=164
> I do not know exactly yet whether it's a bug, a test issue or env setup issue, but need
figure it out. Opening this to raise awareness and see if someone saw that recently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message