Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 83691175EC for ; Mon, 26 Jan 2015 19:34:36 +0000 (UTC) Received: (qmail 39188 invoked by uid 500); 26 Jan 2015 19:34:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 39141 invoked by uid 500); 26 Jan 2015 19:34:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 39131 invoked by uid 99); 26 Jan 2015 19:34:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2015 19:34:33 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FORGED_YAHOO_RCVD,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cnet62@yahoo.com designates 98.138.91.60 as permitted sender) Received: from [98.138.91.60] (HELO nm22-vm0.bullet.mail.ne1.yahoo.com) (98.138.91.60) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2015 19:34:26 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1422300845; bh=uyWOR2Fv5dxjHYUpHPCjHeBHq0chdp5hnZOhytfHKuY=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=QUU7Idf2Et3BWEbbj9cwGaxgmGkTAxaGn9Mj2u8jwg46Fzh+m5S/XRYxGyKbrNFQ4h3aXP56y6/e+Zy51HPsDQlN5dbVnQJjHOAcLojjj67/c0f6BMobHMRKKLVSt7iaq3x9Gx8dIpUPt8V6nYWLWX84aZ/lH8yzmt1UtUhzSEz1ELafNJe/Xq34YLgoQewvPIbYYefTYaupJMnMz2fNKeYNUHYOotCQVNeVDgHMTDPN2DHT5hXtwpvQMzHVBf3ED6RkrxiAa1fyGMuzox6LkpuwjnXwUBPOUpCVxwGqLMsWvTa592lPAud9TjGW8J6HhynOZxQx3NG3uxj9ZBYrzQ== Received: from [98.138.100.103] by nm22.bullet.mail.ne1.yahoo.com with NNFMP; 26 Jan 2015 19:34:05 -0000 Received: from [98.138.87.11] by tm102.bullet.mail.ne1.yahoo.com with NNFMP; 26 Jan 2015 19:34:05 -0000 Received: from [127.0.0.1] by omp1011.mail.ne1.yahoo.com with NNFMP; 26 Jan 2015 19:34:05 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 427226.60173.bm@omp1011.mail.ne1.yahoo.com X-YMail-OSG: 3RQA.EsVM1l1T4Ge9qJCB4aQvkdO6.g04kA4VBcmDz0sUQ6nb7uDe.RqCGp36TA KrA9UHzhBp_7anWEh_1pPPTb0vmIV12yWDEDAzINbWPQQzplaamkimzPItFJfXR0hqy7g9AbBiFm 1h06_V1Y7EPajJJybBXiqPhfv1uJlzcAyayrTmyTLxK0s_ecEoMYY2gnohjz0ahRrwTSsd5KBhGk OqMjpizKXZ.44i3XAX83BUg10z6nbidAAM74vkfvxRtsVXu1Mc6ni93hQK_bR3jSSjKmiOFqpDOp hwEe_XXbf46A4pCIb.MLDYYdZIqhkjNs7e00buxpHs6JhZSl9owgUzmfqcbx698iLqBeMjCnT_45 H_Xr9YvnuPGQsnWlQI_S3_OH9gd5.81ob8xJKxlj.y2WobXMri5x2T1736scIijqLAS_I9_5lyr5 DIlylumlnb7B2Mt3tOzTg8jwnrYRS45SxKVqsDQ9e33EnTgB081NBAQ3Qm25UIRv5j4t9Q7dnlaWC Received: by 98.138.105.210; Mon, 26 Jan 2015 19:34:04 +0000 Date: Mon, 26 Jan 2015 19:34:04 +0000 (UTC) From: Jan Reply-To: Jan To: "user@cassandra.apache.org" Message-ID: <2080629436.452835.1422300844554.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: References: Subject: Re: Fixtures / CI docker MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_452834_147163264.1422300844543" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_452834_147163264.1422300844543 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Alain;=C2=A0 The requirements are impossible to meet, since you are expected to have a= =C2=A0predictable and determinist tests=C2=A0=C2=A0while you need=C2=A0"rec= ent data" (max 1 week old data).Reason: =C2=A0 You cannot have a replicable= result set when the data is variable on a weekly basis. To obtain a replicable test result, I recommend the following:=C2=A0a) =C2= =A0 Keep the 'data' expectation to a point in time which is a known quanta.= =C2=A0b) =C2=A0 Load some data into your cluster & take a snapshot. =C2=A0 = =C2=A0Reload this snapshot before every Test for consistent results. =C2=A0= =C2=A0 hope this helps.=C2=A0 Jan/C* Architect=20 On Monday, January 26, 2015 10:43 AM, Eric Stevens = wrote: =20 I don't have directly relevant advice, especially WRT getting a meaningful= and coherent subset of your production data - that's probably too closely = coupled with your business logic.=C2=A0 Perhaps you can run a testing clust= er with a default TTL on all your tables of ~2 weeks, feeding it with real = production data so that you have a rolling current snapshot of production. We do this basic strategy to support integration tests with the rest of our= platform.=C2=A0 We have a data access service with other internal teams ac= ting as customers of that data.=C2=A0 But it's hard to write strong tests a= gainst this, because it becomes challenging to predict the values which you= should expect to get back without rewriting the business logic directly in= to your tests (and then what exactly are you testing, are you testing your = tests?) But our data interaction layer tests all focus around inserting the data un= der test immediately before the assertions portion of the given test.=C2=A0= We use Specs2 as a testing framework, and that gives us access to a very n= ice "eventually { ... }" syntax which will retry the assertions portion sev= eral times with a backoff (so that we can account for the eventually consis= tent nature of Cassandra, and reduce the number of false failures without h= aving to do test execution speed impacting operations like sleep before ass= ert). Basically our data access layer unit tests are strong and rely only on synt= hetic data (assert that the response is exact for every value), while integ= ration tests from other systems use much softer tests against real data (mo= re like is there data, and does that data seem to be the right format and f= or the right time range). On Mon, Jan 26, 2015 at 3:26 AM, Alain RODRIGUEZ wrote= : Hi guys, We currently use a CI with tests based on docker containers. We have a C* service "dockerized". Yet we have an issue since we would like= 2 things, hard to achieve: - A fix data set to have predictable and determinist tests (that we can rep= eat at any time with the same result)- A recent data set to perform smoke t= esting on things services that need "recent data" (max 1 week old data) As our dataset is very big and data is not sorted by dates in SSTable, it i= s hard to have a coherent extract of the production data. Does anyone of yo= u achieve to have something like this ? For "static" data, we could write queries by hand but I find it more releva= nt to have a real production extract. Regarding dynamic data we need a proc= ess that we could repeat every day / week to update data and have something= light enough to keep fastness in containers start. How do you guys do this kind of things ? FWIW we are migrating to 2.0.11 very soon so solutions might use 2.0 featur= es. Any idea is welcome and if you need more info, please ask. C*heers, Alain ------=_Part_452834_147163264.1422300844543 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Alain; 

The requirem= ents are impossible to meet, since you are expected to have a predictable and determinist tests  while y= ou need "recent data" (max 1 week old= data).
Reason:  = You cannot have a replicable result set when the data is variable on a wee= kly basis.

=
To obtain a replicable t= est result, I recommend the following: 
a)   Keep the 'data' expectation to a point i= n time which is a known quanta. 
b)   Load some data into your cluster & take a s= napshot.    Reload this snapshot before every Test for consistent= results.   

hope this helps.=  

Jan/
C* Architect<= /font>


On Monday, January 26, 2015 10:43 AM, Eric S= tevens <mightye@gmail.com> wrote:


I = don't have directly relevant advice, especially WRT getting a meaningful an= d coherent subset of your production data - that's probably too closely cou= pled with your business logic.  Perhaps you can run a testing cluster = with a default TTL on all your tables of ~2 weeks, feeding it with real pro= duction data so that you have a rolling current snapshot of production.
We do this basic strategy to support integra= tion tests with the rest of our platform.  We have a data access servi= ce with other internal teams acting as customers of that data.  But it= 's hard to write strong tests against this, because it becomes challenging = to predict the values which you should expect to get back without rewriting= the business logic directly into your tests (and then what exactly are you= testing, are you testing your tests?)

<= div>But our data interaction layer tests all focus around inserting the dat= a under test immediately before the assertions portion of the given test.&n= bsp; We use Specs2 as a testing framework, and that gives us access to a ve= ry nice "eventually { ... }" syntax which will retry the assertions portion= several times with a backoff (so that we can account for the eventually co= nsistent nature of Cassandra, and reduce the number of false failures witho= ut having to do test execution speed impacting operations like sleep before= assert).

Basically our data access= layer unit tests are strong and rely only on synthetic data (assert that t= he response is exact for every value), while integration tests from other s= ystems use much softer tests against real data (more like is there data, an= d does that data seem to be the right format and for the right time range).=

On Mon, Jan 26, 2015 at 3:26 AM, Alain RODR= IGUEZ <arodrime@gmail.com> wrote:
Hi guys,

We currently use a CI with tests based on docker containe= rs.

We have a C* service "dockerize= d". Yet we have an issue since we would like 2 things, hard to achieve:

- A fix data set to have predictable a= nd determinist tests (that we can repeat at any time with the same result)<= /div>
- A recent data set to perform smoke testing on things services t= hat need "recent data" (max 1 week old data)

<= /div>
As our dataset is very big and data is not sorted by dates in SST= able, it is hard to have a coherent extract of the production data. Does an= yone of you achieve to have something like this ?

For "static" data, we could write queries by hand but I find= it more relevant to have a real production extract. Regarding dynamic data= we need a process that we could repeat every day / week to update data and= have something light enough to keep fastness in containers start.

How do you guys do this kind of things ?

FWIW we are migrating to 2.0.11 very = soon so solutions might use 2.0 features.

Any idea is welcome and if you need more info, please ask.

C*heers,

=
Alain



------=_Part_452834_147163264.1422300844543--