hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12782) ITBLL fails for me if generator does anything but 5M per maptask
Date Tue, 13 Jan 2015 04:33:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274693#comment-14274693
] 

stack commented on HBASE-12782:
-------------------------------

bq. yeah, debugging ITBLL has proven to be very hard. What I had done previously was to keep
all the files and WAL's and do custom search on top of that.

Let me try and make some tools.  The failure only seems to come at scale which is pain debugging.

On my weekend messings, I was hoping my pointed replication of the set of failures during
a 'suspicious' section of client retries would narrow the debug surface especially if I was
able to do it in a unit test.  What I found was that a high fidelity reproduction of the exceptions
thrown and with retries in extremis, in a unit test environment, it was still insufficient
for dataloss. Taking my unit test and redoing as an IT test to get real cluster timings in
the mix, again, no cigar, not unless the numbers large (100M+) -- but then I was back into
the big original ITBLL space trying to trace the ghost of missing rows.

Let me do the WAL search tool.

> ITBLL fails for me if generator does anything but 5M per maptask
> ----------------------------------------------------------------
>
>                 Key: HBASE-12782
>                 URL: https://issues.apache.org/jira/browse/HBASE-12782
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 1.0.0
>            Reporter: stack
>            Priority: Critical
>             Fix For: 1.0.0
>
>         Attachments: 12782.unit.test.and.it.test.txt, 12782.unit.test.writing.txt
>
>
> Anyone else seeing this?  If I do an ITBLL with generator doing 5M rows per maptask,
all is good -- verify passes. I've been running 5 servers and had one splot per server.  So
below works:
> HADOOP_CLASSPATH="/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase classpath`" ./hadoop/bin/hadoop
--config ~/conf_hadoop org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey
serverKilling Generator 5 5000000 g1.tmp
> or if I double the map tasks, it works:
> HADOOP_CLASSPATH="/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase classpath`" ./hadoop/bin/hadoop
--config ~/conf_hadoop org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey
serverKilling Generator 10 5000000 g2.tmp
> ...but if I change the 5M to 50M or 25M, Verify fails.
> Looking into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message