hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8031) Adopt goraci as an Integration test
Date Tue, 12 Mar 2013 22:47:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600581#comment-13600581
] 

Enis Soztutar commented on HBASE-8031:
--------------------------------------

Thanks Keith for chiming in. 
bq. I am not positive, but it seems like this patch contains the change that we had a discussion
[1] about on github.
Indeed. Your concerns are very valid. But I find this implementation simpler.
bq. I think in some situations a mapper rewritting the same data, because the task failed
previously, could cover up the fact that data was lost in Hbase/Accumulo. Since I created
the test to detect data loss, the change bothers me a bit. Granted the situation seems unlikely...
In case of data loss in HBase, two things should happen to cause rewriting the lost data:

 (1) Some of the map tasks should also fail. 
 (2) Failed map task should contain the rows that are lost. 

(1) and RS data loss should be independent, and we can rely on (1) to not happen that often.
Even we lose a node with RS + TT, since the region and data distribution is balanced, we should
be detecting data loss, from the data in the RS coming from other TT's.
(2) This is also highly unlikely.  Plus, for Loop, we are writing data incrementally. This
means you can only rewrite the data within the same iteration. Verify step, verifies all the
data, not just in the same iteration. Having a large loop count should further mitigate this
problem. 
                
> Adopt goraci as an Integration test
> -----------------------------------
>
>                 Key: HBASE-8031
>                 URL: https://issues.apache.org/jira/browse/HBASE-8031
>             Project: HBase
>          Issue Type: Improvement
>          Components: test
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.95.0, 0.98.0, 0.94.7
>
>         Attachments: hbase-8031_v1.patch
>
>
> As you might know, I am a big fan of the goraci test that Keith Turner has developed,
which in turn is inspired by the Accumulo test called Continuous Ingest. 
> As much as I hate to say it, having to rely on gora and and external github library makes
using this lib cumbersome. And lately we had to use this for testing against secure clusters
and with Hadoop2, which gora does not support for now. 
> So, I am proposing we add this test as an IT in the HBase code base so that all HBase
devs can benefit from it.
> The original source code can be found here:
>  * https://github.com/keith-turner/goraci
>  * https://github.com/enis/goraci/
> From the javadoc:
> {code}
> Apache Accumulo [0] has a simple test suite that verifies that data is not
>  * lost at scale. This test suite is called continuous ingest. This test runs
>  * many ingest clients that continually create linked lists containing 25
>  * million nodes. At some point the clients are stopped and a map reduce job is
>  * run to ensure no linked list has a hole. A hole indicates data was lost.··
>  *
>  * The nodes in the linked list are random. This causes each linked list to
>  * spread across the table. Therefore if one part of a table loses data, then it
>  * will be detected by references in another part of the table.
>  *
> Below is rough sketch of how data is written. For specific details look at
>  * the Generator code.
>  *
>  * 1 Write out 1 million nodes· 2 Flush the client· 3 Write out 1 million that
>  * reference previous million· 4 If this is the 25th set of 1 million nodes,
>  * then update 1st set of million to point to last· 5 goto 1
>  *
>  * The key is that nodes only reference flushed nodes. Therefore a node should
>  * never reference a missing node, even if the ingest client is killed at any
>  * point in time.
>  *
>  * Some ASCII art time:
>      * [ . . . ] represents one batch of random longs of length WIDTH
>      *
>      *                _________________________
>      *               |                  ______ |
>      *               |                 |      ||
>      *             __+_________________+_____ ||
>      *             v v                 v     |||
>      * first   = [ . . . . . . . . . . . ]   |||
>      *             ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^     |||
>      *             | | | | | | | | | | |     |||
>      * prev    = [ . . . . . . . . . . . ]   |||
>      *             ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^     |||
>      *             | | | | | | | | | | |     |||
>      * current = [ . . . . . . . . . . . ]   |||
>      *                                       |||
>      * ...                                   |||
>      *                                       |||
>      * last    = [ . . . . . . . . . . . ]   |||
>      *             | | | | | | | | | | |-----|||
>      *             |                 |--------||
>      *             |___________________________|
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message