Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9C43178EA for ; Tue, 20 Jan 2015 19:22:35 +0000 (UTC) Received: (qmail 97553 invoked by uid 500); 20 Jan 2015 19:22:35 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 97440 invoked by uid 500); 20 Jan 2015 19:22:35 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 97207 invoked by uid 99); 20 Jan 2015 19:22:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jan 2015 19:22:35 +0000 Date: Tue, 20 Jan 2015 19:22:35 +0000 (UTC) From: "Russ Hatch (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8654) Data validation test MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284249#comment-14284249 ] Russ Hatch commented on CASSANDRA-8654: --------------------------------------- One notion I have explored is doing this from dtest using a simple log of row contents (on disk). My prototype used the datahelp.py functionality in dtest to create data in C* and also maintains the log which is used as the authority on what the DB rows should look like. I can expand on this idea further, but it does have some drawbacks in it's present state (it would take some work to really make it useful). This is incomplete, but it in a very basic sense the dtest would look a bit like this: https://github.com/riptano/cassandra-dtest/blob/experimental_datatool/paging_test.py#L589 Create a log object of some kind, make a call to create a bunch of data, passing in the log so the data creation code can log expected DB state. The other notion in this prototype was to make the logging pluggable, so if we're testing a smaller dataset then could plug in an in-memory log instead of disk: https://github.com/riptano/cassandra-dtest/blob/experimental_datatool/datahelp.py#L158 This is far from complete, but I wanted to show a kernel of the idea. To make it really great we'd need novel schema generation (random), and the code will need to know what operations are available on a generated schema of a particular C* version. (complicated perhaps, but fun). Another direction we could take is trying to figure out a way to do db schema/operations with semi-predictable data patterns, and could capture the on disk log as something more sparse that understands ranges (so if we have pkey 1..1000, key2 as 1..1000 there's maybe no real need to capture those million cells to a log in long-form -- we could abbreviate that somehow). > Data validation test > -------------------- > > Key: CASSANDRA-8654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8654 > Project: Cassandra > Issue Type: Test > Reporter: Russ Hatch > Assignee: Russ Hatch > > There was a recent discussion about the utility of data validation testing. > The goal here would be a harness of some kind that can mix operations and track its own notion of what the DB state should look like, and verify it in detail, or perhaps a sampling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)