hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2341) Suite of test scripts that a.) load a cluster with a verifiable dataset and b.) do random kills of regionserver+datanodes in small cluster
Date Thu, 08 Apr 2010 05:48:36 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854822#action_12854822
] 

Todd Lipcon commented on HBASE-2341:
------------------------------------

I've started work on some python based fault injections here: http://github.com/toddlipcon/gremlins

The work is very preliminary, and I plan on continuing to develop it over the next couple
of weeks, but would be happy to have other people contribute.

Once it's reached a more thorough state we could look at including it right in the HBase source,
though it's generally useful so I plan to keep it on github as well.

> Suite of test scripts that a.) load a cluster with a verifiable dataset and b.) do random
kills of regionserver+datanodes in small cluster
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2341
>                 URL: https://issues.apache.org/jira/browse/HBASE-2341
>             Project: Hadoop HBase
>          Issue Type: Task
>            Reporter: stack
>             Fix For: 0.20.5, 0.21.0
>
>         Attachments: count-slaves.rb, HBASE-2341-0.20.3.patch, test.sh, VerifiableEditor.java,
VerifiableEditor.java
>
>
> We just filed hbase-2340 but discussion up on irc has it that we need something more
hardcore than pussy-footing inside a single jvm as hdfs-2340 does.  The point was made (tlipcon)
that its hard to ensure real recovery working if all is in the one JVM.
> So, this issue is about scripts that can:
> + load a cluster with a dataset that we can 'verify' as in we can tell if it has holes
in it, if data has been lost.
> + script that does random kill of a random node on some random occasion
> + Script that can check cluster for data loss
> All above should work while cluster is under load.
> The above would not sit under junit.
> This looks like a suite that we'd want to run up in ec2 using Andrew's scripts and our
donated aws credits.
> {code}
> 16:12 < tlipcon> here's my goal: we have a 5 node cluster in the back room. I want
to run hbase on that at near full load for a week straight while some process goes around
screwing with it
> 16:12 < tlipcon> then I want to verify that I didn't lose a single edit over that
week
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message