hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
Date Thu, 12 Apr 2012 15:33:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252491#comment-13252491
] 

Keith Turner commented on HBASE-5754:
-------------------------------------

The counts for the 1B run seem odd to me , but maybe thats just an artifact of how many map
task you ran for the generator and how much data each task generated.  If a map task does
not does not generate a multiple of 25,000,000 then it will leave some unreferenced.  It generates
a circular linked list every 25M.   

{noformat}
12/04/12 03:54:11 INFO mapred.JobClient:     REFERENCED=564459547
12/04/12 03:54:11 INFO mapred.JobClient:     UNREFERENCED=1040000000
{noformat}

If you were to run 10 map task each generating 100M, then this should generate 1B with all
nodes referenced.  Minimizing the number of unreferenced is ideal, because the test can not
detect the loss of unreferenced nodes.  I should probably add this info to the readme.

                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both
hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The verification
failed because about 21K entries were missing.  The goraci [README|https://github.com/keith-turner/goraci]
explains the test, and how it detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message