Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of
 SRS0=QgP6KR=4M=basetechnology.com=jack@yourhostingaccount.com designates
 65.254.253.161 as permitted sender)
Message-ID: <9CBDC0160BC649B18265E05E8666E92F@JackKrupansky14>
From: "Jack Krupansky" <jack@basetechnology.com>
To: <user@cassandra.apache.org>
References: 
 <CACK4mqRu2AU3P+Qo44ED1KszNqWcc193XLs6fScKZKu5sOWgjg@mail.gmail.com><53C780DF.2050703@gmail.com>
 <CACK4mqQnYHNpFx4wSMMPdDfQ9wqchkv2mfcp5v10QKyVoCW1cg@mail.gmail.com>
In-Reply-To: 
 <CACK4mqQnYHNpFx4wSMMPdDfQ9wqchkv2mfcp5v10QKyVoCW1cg@mail.gmail.com>
Subject: Re: trouble showing cluster scalability for read performance
Date: Thu, 17 Jul 2014 10:03:54 -0400
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_19ED_01CFA1A6.6AAA6A30"
Importance: Normal
Sender: "Jack Krupansky" <jack@basetechnology.com>

This is a multi-part message in MIME format.

------=_NextPart_000_19ED_01CFA1A6.6AAA6A30
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable

It sounds as if you are actually testing =E2=80=9Cvertical =
scalability=E2=80=9D (load on a single node) rather than =
Cassandra=E2=80=99s sweet spot of =E2=80=9Chorizontal =
scalability=E2=80=9D (add more nodes to handle higher load.) Maybe you =
could clarify your intentions and specific use case.

Also, it sounds like you are trying to focus on large queries, but =
Cassandra=E2=80=99s sweet spot is lots of smaller queries. With larger =
queries you can end up measuring things like the capabilities of your =
hardware, cpu cores, memory, I/O bandwidth, network latency, JVM =
configuration, etc. rather than measuring Cassandra per se. So, again, =
maybe you could clarify your intended use case.

It might be that you need to add more =E2=80=9Cvertical scale=E2=80=9D =
(bigger box, more cores, more memory, beefier I/O and networking) to =
handle large queries, or maybe simple, Cassandra-style =
=E2=80=9Chorizontal scaling=E2=80=9D (adding nodes) will be sufficient. =
Sure, you can tune Cassandra for single-node performance, but that seems =
lot a lot of extra work, to me, compared to adding more cheap nodes.

-- Jack Krupansky

From: Diane Griffith=20
Sent: Thursday, July 17, 2014 9:31 AM
To: user=20
Subject: Re: trouble showing cluster scalability for read performance

Duncan, =20

Thanks for that feedback.  I'll give a bit more info and then ask some =
more questions.=20

Our Goal:  Not to produce the fastest read but show horizontal scaling.

Test procedure: =20
* Inserted 54M rows where one third of that represents a unique key, 18M =
keys.  End result given our schema is the 54M rows becomes 72M rows in =
the column family as the control query load to use.
* have a client that queries 100k records in configurable batches, set =
to 1k.  And then it does 100 reps of queries.  It doesn't do the same =
keys for each rep, it uses an offset and then it increases the keys to =
query. =20
* We can adjust the hit rate, i.e. how many of the keys will be found =
but have been focused on 100% hit rate
* we run the query where multiple clients can be spawned to do the same =
query cycle 100k keys but the offset is not different so each client =
will query the same keys.
* We thought we should manually compact the tables down to 1 sstable on =
a given node for consistent results across different cluster sizes
* We had set replication factor to 1 originally to not complicate things =
or impact initial write times even.  We would assess rf later was our =
thought.  Since we changed the keys getting queried it would have to hit =
additional nodes to get row data but for just 1 client thread (to get =
simplest path to show horizontal scaling, had a slight decrease of =
performance when going to 4 nodes from 2 nodes)

Things seen off of given procedure and set up:

  1.. 1 client thread:  2 nodes do better than 1 node on the query test. =
 But 4 nodes did not do better than 2.

  2.. 2 client threads: 2 nodes were still doing better than 1 node=20
  3.. 10 client threads: the times drastically suffered and 2 nodes were =
doing 1/2 the speed of 1 node but before 1 to 2 threads performed better =
on 2 nodes vs 1 node.  There was a huge decrease in performance on 2 =
nodes and just a mild decrease on 1 node.=20
Note: 50+ threads was also drastically falling apart.


Observations:
  a.. compacting each node to 1 table did not seem to help as running 10 =
client threads on exploded sstables and 2 nodes was 2x better than the =
last 2 node 10 client test but still decreased performance from 1 to 2 =
threads query against compacted tables

  b.. I would see upwards to 10 read requests pending at times while 8 =
to 10 were processing when I did nodetool tpstats.

  c.. having key cache on or disabled did not seem to impact things =
noticeably with our current configuration

.

Questions:
  1.. can multiple threads read the same sstable at the same time?  Does =
compacting down to 1 sstable (to get a given row into one sstable) add =
any benefit or actually hurt like limited testing has indicated =
currently?

  2.. given the above testing process, does it still make sense to =
adjust replication factor appropriately for cluster size (i.e. 1 for 1 =
node cluster, 2 for 2 node cluster, 3 for n size cluster).  We assumed =
it was just the ability for threads to connect into a coordinator that =
would help but sounds like it can still block


I'm going to try a limited test with changing replication factor.  But =
if anyone has any input on compacting to 1 sstable benefit or detriment =
on just simple scalability test, how if at all does cassandra block on =
reading sstables, and if higher replication factors do indeed help =
produce reliable results it would be appreciated.  I know part of our =
charter was keep it simple to produce the scalability proof but it does =
sound like replication factor is hurting us if the delay between clients =
for the same keys is not long enough given the fact we are not doing =
different offsets for each client thread. =20

Thanks,
Diane


On Thu, Jul 17, 2014 at 3:53 AM, Duncan Sands <duncan.sands@gmail.com> =
wrote:

  Hi Diane,=20


  On 17/07/14 06:19, Diane Griffith wrote:

    We have been struggling proving out linear read performance with our =
cassandra
    configuration, that it is horizontally scaling.  Wondering if anyone =
has any
    suggestions for what minimal configuration and approach to use to =
demonstrate this.

    We were trying to go for a simple set up, so on the keyspace and/or =
column
    families we went with the following settings thinking it was the =
minimal to
    prove scaling:

    replication_factor set to 1,


  a RF of 1 means that any particular bit of data exists on exactly one =
node.  So if you are testing read speed by reading the same data item =
again and again as fast as you can, then all the reads will be coming =
from the same one node, the one that has that data item on it.  In this =
situation adding more nodes won't help.  Maybe this isn't exactly how =
you are testing read speed, but perhaps you are doing something =
analogous?  I suggest you explain how you are measuring read speed =
exactly.

  Ciao, Duncan.


    SimpleStrategy,
    default consistency level,
    default compaction strategy (size tiered),
    but compacted down to 1 sstable per cf on each node (versus using =
leveled
    compaction for read performance)


    *Read Performance Results:*=20

    1 client thread - 2 nodes > 1 node was seen but we couldn't show =
increased
    performance adding more nodes i.e 4 nodes ! > 2 nodes
    2 client threads - 2 nodes > 1 node still was true but again we =
couldn't show
    increased performance adding more nodes i.e. 4 nodes ! > 2 nodes
    10 client threads - this time 2 nodes < 1 node on performance =
numbers.  2 nodes
    suffered from larger reduce throughput than 1 node was showing.

    Where are we going wrong?

    How have others shown horizontal scaling for reads?

    Thanks,
    Diane


------=_NextPart_000_19ED_01CFA1A6.6AAA6A30
Content-Type: text/html;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<HTML><HEAD></HEAD>
<BODY dir=3Dltr>
<DIV dir=3Dltr>
<DIV style=3D"FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">
<DIV>It sounds as if you are actually testing =E2=80=9Cvertical =
scalability=E2=80=9D (load on a=20
single node) rather than Cassandra=E2=80=99s sweet spot of =
=E2=80=9Chorizontal scalability=E2=80=9D (add=20
more nodes to handle higher load.) Maybe you could clarify your =
intentions and=20
specific use case.</DIV>
<DIV>&nbsp;</DIV>
<DIV>Also, it sounds like you are trying to focus on large queries, but=20
Cassandra=E2=80=99s sweet spot is lots of smaller queries. With larger =
queries you can=20
end up measuring things like the capabilities of your hardware, cpu =
cores,=20
memory, I/O bandwidth, network latency, JVM configuration, etc. rather =
than=20
measuring Cassandra per se. So, again, maybe you could clarify your =
intended use=20
case.</DIV>
<DIV>&nbsp;</DIV>
<DIV>It might be that you need to add more =E2=80=9Cvertical =
scale=E2=80=9D (bigger box, more=20
cores, more memory, beefier I/O and networking) to handle large queries, =
or=20
maybe simple, Cassandra-style =E2=80=9Chorizontal scaling=E2=80=9D =
(adding nodes) will be=20
sufficient. Sure, you can tune Cassandra for single-node performance, =
but that=20
seems lot a lot of extra work, to me, compared to adding more cheap =
nodes.</DIV>
<DIV>&nbsp;</DIV>
<DIV style=3D"FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: =
#000000">-- Jack=20
Krupansky</DIV>
<DIV=20
style=3D'FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: =
"Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; =
DISPLAY: inline'>
<DIV style=3D"FONT: 10pt tahoma">
<DIV>&nbsp;</DIV>
<DIV style=3D"BACKGROUND: #f5f5f5">
<DIV style=3D"font-color: black"><B>From:</B> <A =
title=3Ddfgriffith@gmail.com=20
href=3D"mailto:dfgriffith@gmail.com">Diane Griffith</A> </DIV>
<DIV><B>Sent:</B> Thursday, July 17, 2014 9:31 AM</DIV>
<DIV><B>To:</B> <A title=3Duser@cassandra.apache.org=20
href=3D"mailto:user@cassandra.apache.org">user</A> </DIV>
<DIV><B>Subject:</B> Re: trouble showing cluster scalability for read=20
performance</DIV></DIV></DIV>
<DIV>&nbsp;</DIV></DIV>
<DIV=20
style=3D'FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: =
"Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; =
DISPLAY: inline'>
<DIV dir=3Dltr>Duncan,&nbsp;=20
<DIV>&nbsp;</DIV>
<DIV>Thanks for that feedback.&nbsp; I'll give a bit more info and then =
ask some=20
more questions.=20
<DIV>&nbsp;</DIV>
<DIV><B>Our Goal</B>:&nbsp; Not to produce the fastest read but show =
horizontal=20
scaling.</DIV>
<DIV>&nbsp;</DIV>
<DIV>
<DIV><B>Test procedure</B>:&nbsp; </DIV>
<DIV>* Inserted 54M rows where one third of that represents a unique =
key, 18M=20
keys.&nbsp; End result given our schema is the 54M rows becomes 72M rows =
in the=20
column family as the control query load to use.</DIV>
<DIV>* have a client that queries 100k records in configurable batches, =
set to=20
1k.&nbsp; And then it does 100 reps of queries.&nbsp; It doesn't do the =
same=20
keys for each rep, it uses an offset and then it increases the keys to=20
query.&nbsp; </DIV>
<DIV>* We can adjust the hit rate, i.e. how many of the keys will be =
found but=20
have been focused on 100% hit rate</DIV>
<DIV>* we run the query where multiple clients can be spawned to do the =
same=20
query cycle 100k keys but the offset is not different so each client =
will query=20
the same keys.</DIV></DIV>
<DIV>* We thought we should manually compact the tables down to 1 =
sstable on a=20
given node for consistent results across different cluster sizes</DIV>
<DIV>* We had set replication factor to 1 originally to not complicate =
things or=20
impact initial write times even.&nbsp; We would assess rf later was our=20
thought.&nbsp; Since we changed the keys getting queried it would have =
to hit=20
additional nodes to get row data but for just 1 client thread (to get =
simplest=20
path to show horizontal scaling, had a slight decrease of performance =
when going=20
to 4 nodes from 2 nodes)</DIV>
<DIV>&nbsp;</DIV>
<DIV>Things seen off of given procedure and set up:</DIV>
<DIV>&nbsp;</DIV>
<DIV>
<OL>
  <LI>1 client thread:&nbsp; 2 nodes do better than 1 node on the query=20
  test.&nbsp; But 4 nodes did not do better than 2.<BR>
  <LI>2 client threads: 2 nodes were still doing better than 1 node=20
  <LI>10 client threads: the times drastically suffered and 2 nodes were =
doing=20
  1/2 the speed of 1 node but before 1 to 2 threads performed better on =
2 nodes=20
  vs 1 node.&nbsp; There was a huge decrease in performance on 2 nodes =
and just=20
  a mild decrease on 1 node. </LI></OL></DIV>
<DIV>Note: 50+ threads was also drastically falling apart.<BR></DIV>
<DIV>&nbsp;</DIV>
<DIV><B>Observations</B>:</DIV>
<DIV>
<UL>
  <LI>compacting each node to 1 table did not seem to help as running 10 =
client=20
  threads on exploded sstables and 2 nodes was 2x better than the last 2 =
node 10=20
  client test but still decreased performance from 1 to 2 threads query =
against=20
  compacted tables<BR>
  <LI>I would see upwards to 10 read requests pending at times while 8 =
to 10=20
  were processing when I did nodetool tpstats.<BR>
  <LI>having key cache on or disabled did not seem to impact things =
noticeably=20
  with our current configuration<BR></LI></UL></DIV>
<DIV>.</DIV>
<DIV>&nbsp;</DIV>
<DIV><B>Questions:</B></DIV>
<DIV>
<OL>
  <LI>can multiple threads read the same sstable at the same time?&nbsp; =
Does=20
  compacting down to 1 sstable (to get a given row into one sstable) add =
any=20
  benefit or actually hurt like limited testing has indicated =
currently?<BR>
  <LI>given the above testing process, does it still make sense to =
adjust=20
  replication factor appropriately for cluster size (i.e. 1 for 1 node =
cluster,=20
  2 for 2 node cluster, 3 for n size cluster).&nbsp; We assumed it was =
just the=20
  ability for threads to connect into a coordinator that would help but =
sounds=20
  like it can still block<BR></LI></OL></DIV>
<DIV>&nbsp;</DIV>
<DIV class=3Dgmail_extra>I'm going to try a limited test with changing =
replication=20
factor.&nbsp; But if anyone has any input on compacting to 1 sstable =
benefit or=20
detriment on just simple scalability test, how if at all does cassandra =
block on=20
reading sstables, and if higher replication factors do indeed help =
produce=20
reliable results it would be appreciated.&nbsp; I know part of our =
charter was=20
keep it simple to produce the scalability proof but it does sound like=20
replication factor is hurting us if the delay between clients for the =
same keys=20
is not long enough given the fact we are not doing different offsets for =
each=20
client thread.&nbsp; </DIV>
<DIV class=3Dgmail_extra>&nbsp;</DIV>
<DIV class=3Dgmail_extra>Thanks,</DIV>
<DIV class=3Dgmail_extra>Diane<BR><BR>
<DIV class=3Dgmail_quote>On Thu, Jul 17, 2014 at 3:53 AM, Duncan Sands =
<SPAN=20
dir=3Dltr>&lt;<A href=3D"mailto:duncan.sands@gmail.com"=20
target=3D_blank>duncan.sands@gmail.com</A>&gt;</SPAN> wrote:<BR>
<BLOCKQUOTE class=3Dgmail_quote=20
style=3D"PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: =
rgb(204,204,204) 1px solid">Hi=20
  Diane,=20
  <DIV><BR><BR>On 17/07/14 06:19, Diane Griffith wrote:<BR>
  <BLOCKQUOTE class=3Dgmail_quote=20
  style=3D"PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: =
rgb(204,204,204) 1px solid">We=20
    have been struggling proving out linear read performance with our=20
    cassandra<BR>configuration, that it is horizontally scaling.&nbsp; =
Wondering=20
    if anyone has any<BR>suggestions for what minimal configuration and =
approach=20
    to use to demonstrate this.<BR><BR>We were trying to go for a simple =
set up,=20
    so on the keyspace and/or column<BR>families we went with the =
following=20
    settings thinking it was the minimal to<BR>prove=20
    scaling:<BR><BR>replication_factor set to =
1,<BR></BLOCKQUOTE><BR></DIV>a RF of=20
  1 means that any particular bit of data exists on exactly one =
node.&nbsp; So=20
  if you are testing read speed by reading the same data item again and =
again as=20
  fast as you can, then all the reads will be coming from the same one =
node, the=20
  one that has that data item on it.&nbsp; In this situation adding more =
nodes=20
  won't help.&nbsp; Maybe this isn't exactly how you are testing read =
speed, but=20
  perhaps you are doing something analogous?&nbsp; I suggest you explain =
how you=20
  are measuring read speed exactly.<BR><BR>Ciao, Duncan.<BR><BR>
  <BLOCKQUOTE class=3Dgmail_quote=20
  style=3D"PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: =
rgb(204,204,204) 1px solid">
    <DIV>SimpleStrategy,<BR>default consistency level,<BR>default =
compaction=20
    strategy (size tiered),<BR>but compacted down to 1 sstable per cf on =
each=20
    node (versus using leveled<BR>compaction for read=20
    performance)<BR><BR></DIV>*Read Performance Results:*=20
    <DIV><BR>1 client thread - 2 nodes &gt; 1 node was seen but we =
couldn't show=20
    increased<BR>performance adding more nodes i.e 4 nodes ! &gt; 2 =
nodes<BR>2=20
    client threads - 2 nodes &gt; 1 node still was true but again we =
couldn't=20
    show<BR>increased performance adding more nodes i.e. 4 nodes ! &gt; =
2=20
    nodes<BR>10 client threads - this time 2 nodes &lt; 1 node on =
performance=20
    numbers.&nbsp; 2 nodes<BR>suffered from larger reduce throughput =
than 1 node=20
    was showing.<BR><BR>Where are we going wrong?<BR><BR>How have others =
shown=20
    horizontal scaling for=20
  =
reads?<BR><BR>Thanks,<BR>Diane<BR></DIV></BLOCKQUOTE><BR></BLOCKQUOTE></D=
IV>
<DIV>&nbsp;</DIV></DIV></DIV></DIV></DIV></DIV></DIV></BODY></HTML>

------=_NextPart_000_19ED_01CFA1A6.6AAA6A30--