Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of cnet62@yahoo.com designates
 98.138.91.60 as permitted sender)
Date: Mon, 26 Jan 2015 19:34:04 +0000 (UTC)
From: Jan <cnet62@yahoo.com>
Reply-To: Jan <cnet62@yahoo.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Message-ID: <2080629436.452835.1422300844554.JavaMail.yahoo@mail.yahoo.com>
In-Reply-To: 
 <CAORswtzqkGnFwF9NBDR9nyO4jU=SGfRLzAo-PaLMfsfhP+Y_Og@mail.gmail.com>
References: 
 <CAORswtzqkGnFwF9NBDR9nyO4jU=SGfRLzAo-PaLMfsfhP+Y_Og@mail.gmail.com>
Subject: Re: Fixtures / CI docker
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_452834_147163264.1422300844543"

------=_Part_452834_147163264.1422300844543
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Alain;=C2=A0
The requirements are impossible to meet, since you are expected to have a=
=C2=A0predictable and determinist tests=C2=A0=C2=A0while you need=C2=A0"rec=
ent data" (max 1 week old data).Reason: =C2=A0 You cannot have a replicable=
 result set when the data is variable on a weekly basis.
To obtain a replicable test result, I recommend the following:=C2=A0a) =C2=
=A0 Keep the 'data' expectation to a point in time which is a known quanta.=
=C2=A0b) =C2=A0 Load some data into your cluster & take a snapshot. =C2=A0 =
=C2=A0Reload this snapshot before every Test for consistent results. =C2=A0=
=C2=A0
hope this helps.=C2=A0
Jan/C* Architect=20

     On Monday, January 26, 2015 10:43 AM, Eric Stevens <mightye@gmail.com>=
 wrote:
  =20

 I don't have directly relevant advice, especially WRT getting a meaningful=
 and coherent subset of your production data - that's probably too closely =
coupled with your business logic.=C2=A0 Perhaps you can run a testing clust=
er with a default TTL on all your tables of ~2 weeks, feeding it with real =
production data so that you have a rolling current snapshot of production.
We do this basic strategy to support integration tests with the rest of our=
 platform.=C2=A0 We have a data access service with other internal teams ac=
ting as customers of that data.=C2=A0 But it's hard to write strong tests a=
gainst this, because it becomes challenging to predict the values which you=
 should expect to get back without rewriting the business logic directly in=
to your tests (and then what exactly are you testing, are you testing your =
tests?)
But our data interaction layer tests all focus around inserting the data un=
der test immediately before the assertions portion of the given test.=C2=A0=
 We use Specs2 as a testing framework, and that gives us access to a very n=
ice "eventually { ... }" syntax which will retry the assertions portion sev=
eral times with a backoff (so that we can account for the eventually consis=
tent nature of Cassandra, and reduce the number of false failures without h=
aving to do test execution speed impacting operations like sleep before ass=
ert).
Basically our data access layer unit tests are strong and rely only on synt=
hetic data (assert that the response is exact for every value), while integ=
ration tests from other systems use much softer tests against real data (mo=
re like is there data, and does that data seem to be the right format and f=
or the right time range).
On Mon, Jan 26, 2015 at 3:26 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote=
:

Hi guys,
We currently use a CI with tests based on docker containers.
We have a C* service "dockerized". Yet we have an issue since we would like=
 2 things, hard to achieve:
- A fix data set to have predictable and determinist tests (that we can rep=
eat at any time with the same result)- A recent data set to perform smoke t=
esting on things services that need "recent data" (max 1 week old data)
As our dataset is very big and data is not sorted by dates in SSTable, it i=
s hard to have a coherent extract of the production data. Does anyone of yo=
u achieve to have something like this ?
For "static" data, we could write queries by hand but I find it more releva=
nt to have a real production extract. Regarding dynamic data we need a proc=
ess that we could repeat every day / week to update data and have something=
 light enough to keep fastness in containers start.
How do you guys do this kind of things ?
FWIW we are migrating to 2.0.11 very soon so solutions might use 2.0 featur=
es.
Any idea is welcome and if you need more info, please ask.
C*heers,
Alain


------=_Part_452834_147163264.1422300844543
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"color:#000; background-color:#fff; font-family:He=
lveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;fo=
nt-size:12px"><div dir=3D"ltr" id=3D"yui_3_16_0_1_1422299369144_14626"><spa=
n>Hi Alain;&nbsp;</span></div><div dir=3D"ltr" id=3D"yui_3_16_0_1_142229936=
9144_14627"><span><br></span></div><div dir=3D"ltr" id=3D"yui_3_16_0_1_1422=
299369144_14621"><span id=3D"yui_3_16_0_1_1422299369144_14620">The requirem=
ents are impossible to meet, since you are expected to have a&nbsp;<span st=
yle=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida=
 Grande', sans-serif; font-size: 13px;" class=3D"" id=3D"yui_3_16_0_1_14222=
99369144_14635">predictable and determinist tests&nbsp;</span>&nbsp;while y=
ou need&nbsp;</span><span style=3D"font-family: 'Helvetica Neue', 'Segoe UI=
', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px;" class=
=3D"" id=3D"yui_3_16_0_1_1422299369144_14646">"recent data" (max 1 week old=
 data).</span></div><div dir=3D"ltr" id=3D"yui_3_16_0_1_1422299369144_14621=
"><span style=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Aria=
l, 'Lucida Grande', sans-serif; font-size: 13px;" class=3D"">Reason: &nbsp;=
 You cannot have a replicable result set when the data is variable on a wee=
kly basis.</span></div><div dir=3D"ltr" id=3D"yui_3_16_0_1_1422299369144_14=
621"><span style=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, A=
rial, 'Lucida Grande', sans-serif; font-size: 13px;" class=3D""><br></span>=
</div><div dir=3D"ltr" id=3D"yui_3_16_0_1_1422299369144_14621"><font face=
=3D"Helvetica Neue, Segoe UI, Helvetica, Arial, Lucida Grande, sans-serif" =
size=3D"2" id=3D"yui_3_16_0_1_1422299369144_14849">To obtain a replicable t=
est result, I recommend the following:&nbsp;</font></div><div dir=3D"ltr" i=
d=3D"yui_3_16_0_1_1422299369144_14621"><font face=3D"Helvetica Neue, Segoe =
UI, Helvetica, Arial, Lucida Grande, sans-serif" size=3D"2" id=3D"yui_3_16_=
0_1_1422299369144_14850">a) &nbsp; Keep the 'data' expectation to a point i=
n time which is a known quanta.&nbsp;</font></div><div dir=3D"ltr" id=3D"yu=
i_3_16_0_1_1422299369144_14621"><font face=3D"Helvetica Neue, Segoe UI, Hel=
vetica, Arial, Lucida Grande, sans-serif" size=3D"2" id=3D"yui_3_16_0_1_142=
2299369144_15087">b) &nbsp; Load some data into your cluster &amp; take a s=
napshot. &nbsp; &nbsp;Reload this snapshot before every Test for consistent=
 results. &nbsp;&nbsp;</font></div><div dir=3D"ltr" id=3D"yui_3_16_0_1_1422=
299369144_14621"><font face=3D"Helvetica Neue, Segoe UI, Helvetica, Arial, =
Lucida Grande, sans-serif" size=3D"2"><br></font></div><div dir=3D"ltr" id=
=3D"yui_3_16_0_1_1422299369144_14621"><font face=3D"Helvetica Neue, Segoe U=
I, Helvetica, Arial, Lucida Grande, sans-serif" size=3D"2">hope this helps.=
&nbsp;</font></div><div dir=3D"ltr" id=3D"yui_3_16_0_1_1422299369144_14621"=
><font face=3D"Helvetica Neue, Segoe UI, Helvetica, Arial, Lucida Grande, s=
ans-serif" size=3D"2"><br></font></div><div dir=3D"ltr" id=3D"yui_3_16_0_1_=
1422299369144_14621"><font face=3D"Helvetica Neue, Segoe UI, Helvetica, Ari=
al, Lucida Grande, sans-serif" size=3D"2">Jan/</font></div><div dir=3D"ltr"=
 id=3D"yui_3_16_0_1_1422299369144_14621"><font face=3D"Helvetica Neue, Sego=
e UI, Helvetica, Arial, Lucida Grande, sans-serif" size=3D"2">C* Architect<=
/font></div> <div class=3D"qtdSeparateBR"><br><br></div><div class=3D"yahoo=
_quoted" style=3D"display: block;"> <div style=3D"font-family: HelveticaNeu=
e, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif; font-size: =
12px;"> <div style=3D"font-family: HelveticaNeue, Helvetica Neue, Helvetica=
, Arial, Lucida Grande, sans-serif; font-size: 16px;"> <div dir=3D"ltr"> <f=
ont size=3D"2" face=3D"Arial"> On Monday, January 26, 2015 10:43 AM, Eric S=
tevens &lt;mightye@gmail.com&gt; wrote:<br> </font> </div>  <br><br> <div c=
lass=3D"y_msg_container"><div id=3D"yiv6169361649"><div><div dir=3D"ltr">I =
don't have directly relevant advice, especially WRT getting a meaningful an=
d coherent subset of your production data - that's probably too closely cou=
pled with your business logic.&nbsp; Perhaps you can run a testing cluster =
with a default TTL on all your tables of ~2 weeks, feeding it with real pro=
duction data so that you have a rolling current snapshot of production.<div=
><br clear=3D"none"></div><div>We do this basic strategy to support integra=
tion tests with the rest of our platform.&nbsp; We have a data access servi=
ce with other internal teams acting as customers of that data.&nbsp; But it=
's hard to write strong tests against this, because it becomes challenging =
to predict the values which you should expect to get back without rewriting=
 the business logic directly into your tests (and then what exactly are you=
 testing, are you testing your tests?)</div><div><br clear=3D"none"></div><=
div>But our data interaction layer tests all focus around inserting the dat=
a under test immediately before the assertions portion of the given test.&n=
bsp; We use Specs2 as a testing framework, and that gives us access to a ve=
ry nice "eventually { ... }" syntax which will retry the assertions portion=
 several times with a backoff (so that we can account for the eventually co=
nsistent nature of Cassandra, and reduce the number of false failures witho=
ut having to do test execution speed impacting operations like sleep before=
 assert).</div><div><br clear=3D"none"></div><div>Basically our data access=
 layer unit tests are strong and rely only on synthetic data (assert that t=
he response is exact for every value), while integration tests from other s=
ystems use much softer tests against real data (more like is there data, an=
d does that data seem to be the right format and for the right time range).=
</div></div><div class=3D"yiv6169361649yqt1406260965" id=3D"yiv6169361649yq=
t70291"><div class=3D"yiv6169361649gmail_extra"><br clear=3D"none"><div cla=
ss=3D"yiv6169361649gmail_quote">On Mon, Jan 26, 2015 at 3:26 AM, Alain RODR=
IGUEZ <span dir=3D"ltr">&lt;<a rel=3D"nofollow" shape=3D"rect" ymailto=3D"m=
ailto:arodrime@gmail.com" target=3D"_blank" href=3D"mailto:arodrime@gmail.c=
om">arodrime@gmail.com</a>&gt;</span> wrote:<br clear=3D"none"><blockquote =
class=3D"yiv6169361649gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1=
px #ccc solid;padding-left:1ex;"><div dir=3D"ltr">Hi guys,<div><br clear=3D=
"none"></div><div>We currently use a CI with tests based on docker containe=
rs.</div><div><br clear=3D"none"></div><div>We have a C* service "dockerize=
d". Yet we have an issue since we would like 2 things, hard to achieve:</di=
v><div><br clear=3D"none"></div><div>- A fix data set to have predictable a=
nd determinist tests (that we can repeat at any time with the same result)<=
/div><div>- A recent data set to perform smoke testing on things services t=
hat need "recent data" (max 1 week old data)</div><div><br clear=3D"none"><=
/div><div>As our dataset is very big and data is not sorted by dates in SST=
able, it is hard to have a coherent extract of the production data. Does an=
yone of you achieve to have something like this ?</div><div><br clear=3D"no=
ne"></div><div>For "static" data, we could write queries by hand but I find=
 it more relevant to have a real production extract. Regarding dynamic data=
 we need a process that we could repeat every day / week to update data and=
 have something light enough to keep fastness in containers start.</div><di=
v><br clear=3D"none"></div><div>How do you guys do this kind of things ?</d=
iv><div><br clear=3D"none"></div><div>FWIW we are migrating to 2.0.11 very =
soon so solutions might use 2.0 features.</div><div><br clear=3D"none"></di=
v><div>Any idea is welcome and if you need more info, please ask.</div><div=
><br clear=3D"none"></div><div>C*heers,</div><div><br clear=3D"none"></div>=
<div>Alain</div></div>
</blockquote></div><br clear=3D"none"></div></div></div></div><br><br></div=
>  </div> </div>  </div> </div></body></html>
------=_Part_452834_147163264.1422300844543--