cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russ Hatch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8654) Data validation test
Date Tue, 20 Jan 2015 19:22:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284249#comment-14284249
] 

Russ Hatch commented on CASSANDRA-8654:
---------------------------------------

One notion I have explored is doing this from dtest using a simple log of row contents (on
disk). My prototype used the datahelp.py functionality in dtest to create data in C* and also
maintains the log which is used as the authority on what the DB rows should look like. I can
expand on this idea further, but it does have some drawbacks in it's present state (it would
take some work to really make it useful).

This is incomplete, but it in a very basic sense the dtest would look a bit like this: https://github.com/riptano/cassandra-dtest/blob/experimental_datatool/paging_test.py#L589
Create a log object of some kind, make a call to create a bunch of data, passing in the log
so the data creation code can log expected DB state.

The other notion in this prototype was to make the logging pluggable, so if we're testing
a smaller dataset then could plug in an in-memory log instead of disk: https://github.com/riptano/cassandra-dtest/blob/experimental_datatool/datahelp.py#L158

This is far from complete, but I wanted to show a kernel of the idea.

To make it really great we'd need novel schema generation (random), and the code will need
to know what operations are available on a generated schema of a particular C* version. (complicated
perhaps, but fun).

Another direction we could take is trying to figure out a way to do db schema/operations with
semi-predictable data patterns, and could capture the on disk log as something more sparse
that understands ranges (so if we have pkey 1..1000, key2 as 1..1000 there's maybe no real
need to capture those million cells to a log in long-form -- we could abbreviate that somehow).

> Data validation test
> --------------------
>
>                 Key: CASSANDRA-8654
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8654
>             Project: Cassandra
>          Issue Type: Test
>            Reporter: Russ Hatch
>            Assignee: Russ Hatch
>
> There was a recent discussion about the utility of data validation testing.
> The goal here would be a harness of some kind that can mix operations and track its own
notion of what the DB state should look like, and verify it in  detail, or perhaps a sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message