hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-12782) ITBLL fails for me if generator does anything but 5M per maptask
Date Sat, 10 Jan 2015 20:21:34 GMT

     [ https://issues.apache.org/jira/browse/HBASE-12782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-12782:
--------------------------
    Attachment: 12782.unit.test.writing.txt

Focusing on write side first.

Debugging, the emission on end of verify step is of no use. I find that I have to go into
the reduce logging to find these log lines from ITBLL:

          LOG.error("Linked List error: Key = " + keyString + " References = " + refsSb.toString());

I then take the 'References' record, do a get on it.  It is the 'meta:previous' that is 'missing'.
 This missing record will have been 'written' as part of the previous 1M writes at 'count'
- 1M. The time on this record will be a timestamp that is '1M' ahead of when the 'missing'
record would have been written (usually about 15seconds per 1M but if server down, can be
minutes writing the 1M).

The ITBLL rows have too many unprintable characters -- quotes, single ticks, left braces,
etc. -- to make for easy scripting.  Tried but its kinda tough bridging 'text' output -- escaped
bytes -- jruby and java.  Spent some time trying to write rows with printable records but
seems to make for more failures; need to spend time on this... as is its hard to script ITBLL
failures so can get a 'bigger picture' on failure profile.  Another issue.

I've disabled killing master and splits to make things easier for myself. We still fail reliably.

I can triangulate a little looking at a few failed records and have identified suspicious-looking
write periods as asyncprocess tries to cross over a failed regionserver.  The attached test
reproduces the same logging sequence in a unit test (was trying to narrow the moving parts
around a failure) that I see up in cluster but it looks like the asyncprocess is not the issue;
it's accounting doesn't seem to be hiccuping.

Let me redo this test as an integrationtest to run against the cluster to be sure -- perhaps
it a timing thing hard to repro in the one JVM -- but it doesn't look like write side is the
issue.  Dang.

> ITBLL fails for me if generator does anything but 5M per maptask
> ----------------------------------------------------------------
>
>                 Key: HBASE-12782
>                 URL: https://issues.apache.org/jira/browse/HBASE-12782
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 1.0.0
>            Reporter: stack
>            Priority: Critical
>             Fix For: 1.0.0
>
>         Attachments: 12782.unit.test.writing.txt
>
>
> Anyone else seeing this?  If I do an ITBLL with generator doing 5M rows per maptask,
all is good -- verify passes. I've been running 5 servers and had one splot per server.  So
below works:
> HADOOP_CLASSPATH="/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase classpath`" ./hadoop/bin/hadoop
--config ~/conf_hadoop org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey
serverKilling Generator 5 5000000 g1.tmp
> or if I double the map tasks, it works:
> HADOOP_CLASSPATH="/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase classpath`" ./hadoop/bin/hadoop
--config ~/conf_hadoop org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey
serverKilling Generator 10 5000000 g2.tmp
> ...but if I change the 5M to 50M or 25M, Verify fails.
> Looking into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message