hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10079) Increments lost after flush
Date Thu, 05 Dec 2013 01:05:36 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839634#comment-13839634
] 

Jonathan Hsieh commented on HBASE-10079:
----------------------------------------

I'm having a hard time recreating the jagged counts.  I tried reverting patches, and before
and after the patch nkeywal provided.  I think the flush problem was a red herring where I
was biased by the customer problem I was recently working on.

When I changed my tests to do 100000 increments the pattern I saw really jumped out.  Looking
at the original numbers from this morning I see the same pattern present with the 250000 increments.
 

80 threads, 250000 increments == 3125 increments / thread.
count = 246875 != 250000 (flush)  // one thread failed to start.
count = 243750 != 250000 (kill)  // two threads failed to start.  
count = 246878 != 250000 (kill -9) // one thread failed to start and we had 3 threads that
sent increments that succeeded and retried but didn't get an ack because of kill -9).

The last one through we off because it wasn't regular but I think the explanation I have makes
sense.

I'm looking into seeing if my test code is bad (is there TableName documentation I ignoredthat
says  that the race in the stacktrace is my fault) or if we need to add some synchronization
to this createTableNameIfNecessary method.



> Increments lost after flush 
> ----------------------------
>
>                 Key: HBASE-10079
>                 URL: https://issues.apache.org/jira/browse/HBASE-10079
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.96.1
>            Reporter: Jonathan Hsieh
>            Priority: Blocker
>             Fix For: 0.96.1
>
>         Attachments: 10079.v1.patch
>
>
> Testing 0.96.1rc1.
> With one process incrementing a row in a table, we increment single col.  We flush or
do kills/kill-9 and data is lost.  flush and kill are likely the same problem (kill would
flush), kill -9 may or may not have the same root cause.
> 5 nodes
> hadoop 2.1.0 (a pre cdh5b1 hdfs).
> hbase 0.96.1 rc1 
> Test: 250000 increments on a single row an single col with various number of client threads
(IncrementBlaster).  Verify we have a count of 250000 after the run (IncrementVerifier).
> Run 1: No fault injection.  5 runs.  count = 250000. on multiple runs.  Correctness verified.
 1638 inc/s throughput.
> Run 2: flushes table with incrementing row.  count = 246875 !=250000.  correctness failed.
 1517 inc/s throughput.  
> Run 3: kill of rs hosting incremented row.  count = 243750 != 250000. Correctness failed.
  1451 inc/s throughput.
> Run 4: one kill -9 of rs hosting incremented row.  246878.!= 250000.  Correctness failed.
1395 inc/s (including recovery)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message