Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Sat, 27 Jun 2015 09:27:05 +0000 (UTC)
From: "Robert Stupp (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12838836.1434641658000.34012.1435397225366@Atlassian.JIRA>
In-Reply-To: <JIRA.12838836.1434641658000@Atlassian.JIRA>
References: <JIRA.12838836.1434641658000@Atlassian.JIRA>
 <JIRA.12838836.1434641658127@arcas>
Subject: [jira] [Commented] (CASSANDRA-9619) Read performance regression in
 tables with many columns on trunk and 2.2 vs. 2.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/CASSANDRA-9619?page=3Dcom.atlas=
sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D=
14604072#comment-14604072 ]=20

Robert Stupp commented on CASSANDRA-9619:
-----------------------------------------

The regression for this workload is caused by {{sstable_preemptive_open_int=
erval_in_mb}} being ignored (hard-coded to {{-1}}) in 2.1.3 and 2.1.4. It i=
s evaluated in versions before and after these releases.

cstar runs:
* [last "bisect" run|http://cstar.datastax.com/tests/id/8ed4f4c0-1c48-11e5-=
b36d-42010af0688f] that [identifies|http://cstar.datastax.com/graph?stats=
=3D8ed4f4c0-1c48-11e5-b36d-42010af0688f&metric=3Dop_rate&operation=3D3_read=
&smoothing=3D1&show_aggregates=3Dtrue&xmin=3D0&xmax=3D50.6&ymin=3D0&ymax=3D=
196958.3] this [commit|https://github.com/apache/cassandra/commit/cf3e748cb=
f1faaed68870f22a45edc603eb1b4e8].
* [cross check|http://cstar.datastax.com/tests/id/1eee9132-1c4f-11e5-bcd7-4=
2010af0688f] with [latest 2.1 and 2.1 with that commit reversed|http://csta=
r.datastax.com/graph?stats=3D1eee9132-1c4f-11e5-bcd7-42010af0688f&metric=3D=
op_rate&operation=3D3_read&smoothing=3D1&show_aggregates=3Dtrue&xmin=3D0&xm=
ax=3D50.82&ymin=3D0&ymax=3D206758.2]
* [cross check|http://cstar.datastax.com/tests/id/53f35062-1c53-11e5-bcd7-4=
2010af0688f] with [latest 2.2 and 2.2 with that commit reversed|http://csta=
r.datastax.com/graph?stats=3D53f35062-1c53-11e5-bcd7-42010af0688f&metric=3D=
op_rate&operation=3D3_read&smoothing=3D1&show_aggregates=3Dtrue&xmin=3D0&xm=
ax=3D50.27&ymin=3D0&ymax=3D195163.1]

That's the good news. Another good news is that a simple {{cassandra.yaml}}=
 change can =E2=80=9Dsolve=E2=80=9D this regression.

The bad news IMO is that {{sstable_preemptive_open_interval_in_mb}} has som=
e meaning and AFAIK should give some improvement for =E2=80=9Dmatching=E2=
=80=9D workloads. Frankly I don't really know what to do next - whether to =
let it default to {{-1}}, stick with current default of {{50}}, change it t=
o something else. IMO some extensive perf testing should be done (again??) =
to give better advice for this parameter.
I think, this is also the reason why blade_11 and bdplab gave different res=
ults - one has SSDs and one has spindles - just a guess. For reference, I'v=
e started the [2.1-cross-check on bdplab|http://cstar.datastax.com/tests/id=
/32532262-1cac-11e5-8031-42010af0688f].

Another bad news is that there seems to be another less big regression when=
 looking at the numbers of 2.1.4 compared to the current 2.1/2.2 branches w=
ith {{sstable_preemptive_open_interval_in_mb=3D-1}} or approx. 1.5-4% for b=
oth reads and writes. This one is much harder to find - but this one is lik=
ely to be caused by =E2=80=9Dpure=E2=80=9D code change(s).

Finally I have to admit that we should have at least a daily performance cs=
tar test with some =E2=80=9Dstandard=E2=80=9D workloads (90% writes, 90% re=
ads, 50/50) against current dev branches (2.1, 2.2, trunk) linked in cassci=
 (since that's where we usually look at). These tests don't need to run for=
 a long time - 2M or 3M keys should be enough to find obvious regressions. =
For more =E2=80=9Ddetailed" results we already have extensive tests in plac=
e. Beside that, we should run perf tests before commit for everything that =
likely affects performance.

> Read performance regression in tables with many columns on trunk and 2.2 =
vs. 2.1
> -------------------------------------------------------------------------=
-------
>
>                 Key: CASSANDRA-9619
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9619
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jim Witschey
>              Labels: perfomance
>             Fix For: 2.2.0 rc2
>
>
> There seems to be a regression in read in 2.2 and trunk, as compared to 2=
.1 and 2.0. I found it running cstar_perf jobs with 50-column tables. 2.2 m=
ay be worse than trunk, though my results on that aren't consistent. The re=
levant cstar_perf jobs are here:
> http://cstar.datastax.com/tests/id/273e2ea8-0fc8-11e5-816c-42010af0688f
> http://cstar.datastax.com/tests/id/3a8002d6-1480-11e5-97ff-42010af0688f
> http://cstar.datastax.com/tests/id/40ff2766-1248-11e5-bac8-42010af0688f
> The sequence of commands for these jobs is
> {code}
> stress write n=3D65000000 -rate threads=3D300 -col n=3DFIXED\(50\)
> stress read n=3D65000000 -rate threads=3D300
> stress read n=3D65000000 -rate threads=3D300
> {code}
> Have a look at the operations per second going from [the first read opera=
tion|http://cstar.datastax.com/graph?stats=3D273e2ea8-0fc8-11e5-816c-42010a=
f0688f&metric=3Dop_rate&operation=3D2_read&smoothing=3D1&show_aggregates=3D=
true&xmin=3D0&xmax=3D729.08&ymin=3D0&ymax=3D174379.7] to [the second read o=
peration|http://cstar.datastax.com/graph?stats=3D273e2ea8-0fc8-11e5-816c-42=
010af0688f&metric=3Dop_rate&operation=3D2_read&smoothing=3D1&show_aggregate=
s=3Dtrue&xmin=3D0&xmax=3D729.08&ymin=3D0&ymax=3D174379.7]. They've fallen f=
rom ~135K to ~100K comparing trunk to 2.1 and 2.0. It's slightly worse for =
2.2, and 2.2 operations per second fall continuously from the first to the =
second read operation.
> There's a corresponding increase in read latency -- it's noticable on tru=
nk and pretty bad on 2.2. Again, the latency gets higher and higher on 2.2 =
as the read operations progress (see the graphs [here|http://cstar.datastax=
.com/graph?stats=3D273e2ea8-0fc8-11e5-816c-42010af0688f&metric=3D95th_laten=
cy&operation=3D2_read&smoothing=3D1&show_aggregates=3Dtrue&xmin=3D0&xmax=3D=
729.08&ymin=3D0&ymax=3D17.27] and [here|http://cstar.datastax.com/graph?sta=
ts=3D273e2ea8-0fc8-11e5-816c-42010af0688f&metric=3D95th_latency&operation=
=3D3_read&smoothing=3D1&show_aggregates=3Dtrue&xmin=3D0&xmax=3D928.62&ymin=
=3D0&ymax=3D14.52]).
> I see a similar regression in a [more recent test|http://cstar.datastax.c=
om/graph?stats=3D40ff2766-1248-11e5-bac8-42010af0688f&metric=3Dop_rate&oper=
ation=3D2_read&smoothing=3D1&show_aggregates=3Dtrue&xmin=3D0&xmax=3D752.62&=
ymin=3D0&ymax=3D171799.1], though in this one trunk performed worse than 2.=
2. This run also didn't display the increasing latency in 2.2.
> This regression may show for smaller numbers of columns, but not as promi=
nently, as shown [in the results to this test with the stress default of 5 =
columns|http://cstar.datastax.com/graph?stats=3D227cb89e-0fc8-11e5-9f14-420=
10af0688f&metric=3D99.9th_latency&operation=3D3_read&smoothing=3D1&show_agg=
regates=3Dtrue&xmin=3D0&xmax=3D498.19&ymin=3D0&ymax=3D334.29]. There's an i=
ncrease in latency variability on trunk and 2.2, but I don't see a regressi=
on in summary statistics.
> My measurements aren't confounded by [the recent regression in cassandra-=
stress|https://issues.apache.org/jira/browse/CASSANDRA-9558]; cstar_perf us=
es the same stress program (from trunk) on all versions on the cluster.
> I'm currently working to
> - reproduce with a smaller workload so this is easier to bisect and debug=
.
> - get results with larger numbers of columns, since we've seen the regres=
sion on 50 columns but not the stress default of 5.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)