Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 084DC11EFF for ; Thu, 17 Jul 2014 14:04:29 +0000 (UTC) Received: (qmail 74816 invoked by uid 500); 17 Jul 2014 14:04:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 74776 invoked by uid 500); 17 Jul 2014 14:04:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 74766 invoked by uid 99); 17 Jul 2014 14:04:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jul 2014 14:04:24 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of SRS0=QgP6KR=4M=basetechnology.com=jack@yourhostingaccount.com designates 65.254.253.161 as permitted sender) Received: from [65.254.253.161] (HELO walmailout01.yourhostingaccount.com) (65.254.253.161) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jul 2014 14:04:20 +0000 Received: from mailscan12.yourhostingaccount.com ([10.1.15.12] helo=walmailscan12.yourhostingaccount.com) by walmailout01.yourhostingaccount.com with esmtp (Exim) id 1X7mHr-0001R9-BO for user@cassandra.apache.org; Thu, 17 Jul 2014 10:03:55 -0400 Received: from impout02.yourhostingaccount.com ([10.1.55.2] helo=impout02.yourhostingaccount.com) by walmailscan12.yourhostingaccount.com with esmtp (Exim) id 1X7mHr-0005aj-94 for user@cassandra.apache.org; Thu, 17 Jul 2014 10:03:55 -0400 Received: from walauthsmtp14.yourhostingaccount.com ([10.1.18.14]) by impout02.yourhostingaccount.com with NO UCE id TS3v1o0020JCsUy01S3vHK; Thu, 17 Jul 2014 10:03:55 -0400 X-Authority-Analysis: v=2.0 cv=aPZyWMBm c=1 sm=1 a=UkMH5KcvGpXfM81wB0t8ug==:17 a=aQzbgH187woA:10 a=CI5zAEpTf8MA:10 a=3jZET7lWBKwA:10 a=jvYhGVW7AAAA:8 a=pGLkceISAAAA:8 a=T75rtslHYP9qqiGI2UAA:9 a=QEXdDO2ut3YA:10 a=MSl-tDqOz04A:10 a=Mhfa-xHltrgA:10 a=JQiggo9sykJ6irMQ:21 a=ClJ0PPBkkIK0iA5z:21 a=mV9VRH-2AAAA:8 a=pWS--J54NnyuDmexATMA:9 a=_W_S_7VecoQA:10 a=tXsnliwV7b4A:10 a=kGGkqyZSVZcA:10 a=CuQyuWVbZvjWOVji:21 a=ZvHzC8K1c97w41No:21 a=rcbtdQGuPFtN9R+ZKREELQ==:117 X-EN-OrigOutIP: 10.1.18.14 X-EN-IMPSID: TS3v1o0020JCsUy01S3vHK Received: from 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.113.28]:57491 helo=JackKrupansky14) by walauthsmtp14.yourhostingaccount.com with esmtpa (Exim) id 1X7mHq-0003bw-RY for user@cassandra.apache.org; Thu, 17 Jul 2014 10:03:54 -0400 Message-ID: <9CBDC0160BC649B18265E05E8666E92F@JackKrupansky14> From: "Jack Krupansky" To: References: <53C780DF.2050703@gmail.com> In-Reply-To: Subject: Re: trouble showing cluster scalability for read performance Date: Thu, 17 Jul 2014 10:03:54 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_19ED_01CFA1A6.6AAA6A30" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:931c98230c6409dcc37fa7e93b490c27 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.113.28 X-EN-OrigHost: 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. ------=_NextPart_000_19ED_01CFA1A6.6AAA6A30 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable It sounds as if you are actually testing =E2=80=9Cvertical = scalability=E2=80=9D (load on a single node) rather than = Cassandra=E2=80=99s sweet spot of =E2=80=9Chorizontal = scalability=E2=80=9D (add more nodes to handle higher load.) Maybe you = could clarify your intentions and specific use case. Also, it sounds like you are trying to focus on large queries, but = Cassandra=E2=80=99s sweet spot is lots of smaller queries. With larger = queries you can end up measuring things like the capabilities of your = hardware, cpu cores, memory, I/O bandwidth, network latency, JVM = configuration, etc. rather than measuring Cassandra per se. So, again, = maybe you could clarify your intended use case. It might be that you need to add more =E2=80=9Cvertical scale=E2=80=9D = (bigger box, more cores, more memory, beefier I/O and networking) to = handle large queries, or maybe simple, Cassandra-style = =E2=80=9Chorizontal scaling=E2=80=9D (adding nodes) will be sufficient. = Sure, you can tune Cassandra for single-node performance, but that seems = lot a lot of extra work, to me, compared to adding more cheap nodes. -- Jack Krupansky From: Diane Griffith=20 Sent: Thursday, July 17, 2014 9:31 AM To: user=20 Subject: Re: trouble showing cluster scalability for read performance Duncan, =20 Thanks for that feedback. I'll give a bit more info and then ask some = more questions.=20 Our Goal: Not to produce the fastest read but show horizontal scaling. Test procedure: =20 * Inserted 54M rows where one third of that represents a unique key, 18M = keys. End result given our schema is the 54M rows becomes 72M rows in = the column family as the control query load to use. * have a client that queries 100k records in configurable batches, set = to 1k. And then it does 100 reps of queries. It doesn't do the same = keys for each rep, it uses an offset and then it increases the keys to = query. =20 * We can adjust the hit rate, i.e. how many of the keys will be found = but have been focused on 100% hit rate * we run the query where multiple clients can be spawned to do the same = query cycle 100k keys but the offset is not different so each client = will query the same keys. * We thought we should manually compact the tables down to 1 sstable on = a given node for consistent results across different cluster sizes * We had set replication factor to 1 originally to not complicate things = or impact initial write times even. We would assess rf later was our = thought. Since we changed the keys getting queried it would have to hit = additional nodes to get row data but for just 1 client thread (to get = simplest path to show horizontal scaling, had a slight decrease of = performance when going to 4 nodes from 2 nodes) Things seen off of given procedure and set up: 1.. 1 client thread: 2 nodes do better than 1 node on the query test. = But 4 nodes did not do better than 2. 2.. 2 client threads: 2 nodes were still doing better than 1 node=20 3.. 10 client threads: the times drastically suffered and 2 nodes were = doing 1/2 the speed of 1 node but before 1 to 2 threads performed better = on 2 nodes vs 1 node. There was a huge decrease in performance on 2 = nodes and just a mild decrease on 1 node.=20 Note: 50+ threads was also drastically falling apart. Observations: a.. compacting each node to 1 table did not seem to help as running 10 = client threads on exploded sstables and 2 nodes was 2x better than the = last 2 node 10 client test but still decreased performance from 1 to 2 = threads query against compacted tables b.. I would see upwards to 10 read requests pending at times while 8 = to 10 were processing when I did nodetool tpstats. c.. having key cache on or disabled did not seem to impact things = noticeably with our current configuration . Questions: 1.. can multiple threads read the same sstable at the same time? Does = compacting down to 1 sstable (to get a given row into one sstable) add = any benefit or actually hurt like limited testing has indicated = currently? 2.. given the above testing process, does it still make sense to = adjust replication factor appropriately for cluster size (i.e. 1 for 1 = node cluster, 2 for 2 node cluster, 3 for n size cluster). We assumed = it was just the ability for threads to connect into a coordinator that = would help but sounds like it can still block I'm going to try a limited test with changing replication factor. But = if anyone has any input on compacting to 1 sstable benefit or detriment = on just simple scalability test, how if at all does cassandra block on = reading sstables, and if higher replication factors do indeed help = produce reliable results it would be appreciated. I know part of our = charter was keep it simple to produce the scalability proof but it does = sound like replication factor is hurting us if the delay between clients = for the same keys is not long enough given the fact we are not doing = different offsets for each client thread. =20 Thanks, Diane On Thu, Jul 17, 2014 at 3:53 AM, Duncan Sands = wrote: Hi Diane,=20 On 17/07/14 06:19, Diane Griffith wrote: We have been struggling proving out linear read performance with our = cassandra configuration, that it is horizontally scaling. Wondering if anyone = has any suggestions for what minimal configuration and approach to use to = demonstrate this. We were trying to go for a simple set up, so on the keyspace and/or = column families we went with the following settings thinking it was the = minimal to prove scaling: replication_factor set to 1, a RF of 1 means that any particular bit of data exists on exactly one = node. So if you are testing read speed by reading the same data item = again and again as fast as you can, then all the reads will be coming = from the same one node, the one that has that data item on it. In this = situation adding more nodes won't help. Maybe this isn't exactly how = you are testing read speed, but perhaps you are doing something = analogous? I suggest you explain how you are measuring read speed = exactly. Ciao, Duncan. SimpleStrategy, default consistency level, default compaction strategy (size tiered), but compacted down to 1 sstable per cf on each node (versus using = leveled compaction for read performance) *Read Performance Results:*=20 1 client thread - 2 nodes > 1 node was seen but we couldn't show = increased performance adding more nodes i.e 4 nodes ! > 2 nodes 2 client threads - 2 nodes > 1 node still was true but again we = couldn't show increased performance adding more nodes i.e. 4 nodes ! > 2 nodes 10 client threads - this time 2 nodes < 1 node on performance = numbers. 2 nodes suffered from larger reduce throughput than 1 node was showing. Where are we going wrong? How have others shown horizontal scaling for reads? Thanks, Diane ------=_NextPart_000_19ED_01CFA1A6.6AAA6A30 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable
It sounds as if you are actually testing =E2=80=9Cvertical = scalability=E2=80=9D (load on a=20 single node) rather than Cassandra=E2=80=99s sweet spot of = =E2=80=9Chorizontal scalability=E2=80=9D (add=20 more nodes to handle higher load.) Maybe you could clarify your = intentions and=20 specific use case.
 
Also, it sounds like you are trying to focus on large queries, but=20 Cassandra=E2=80=99s sweet spot is lots of smaller queries. With larger = queries you can=20 end up measuring things like the capabilities of your hardware, cpu = cores,=20 memory, I/O bandwidth, network latency, JVM configuration, etc. rather = than=20 measuring Cassandra per se. So, again, maybe you could clarify your = intended use=20 case.
 
It might be that you need to add more =E2=80=9Cvertical = scale=E2=80=9D (bigger box, more=20 cores, more memory, beefier I/O and networking) to handle large queries, = or=20 maybe simple, Cassandra-style =E2=80=9Chorizontal scaling=E2=80=9D = (adding nodes) will be=20 sufficient. Sure, you can tune Cassandra for single-node performance, = but that=20 seems lot a lot of extra work, to me, compared to adding more cheap = nodes.
 
-- Jack=20 Krupansky
 
Sent: Thursday, July 17, 2014 9:31 AM
To: user
Subject: Re: trouble showing cluster scalability for read=20 performance
 
Duncan, =20
 
Thanks for that feedback.  I'll give a bit more info and then = ask some=20 more questions.=20
 
Our Goal:  Not to produce the fastest read but show = horizontal=20 scaling.
 
Test procedure
* Inserted 54M rows where one third of that represents a unique = key, 18M=20 keys.  End result given our schema is the 54M rows becomes 72M rows = in the=20 column family as the control query load to use.
* have a client that queries 100k records in configurable batches, = set to=20 1k.  And then it does 100 reps of queries.  It doesn't do the = same=20 keys for each rep, it uses an offset and then it increases the keys to=20 query. 
* We can adjust the hit rate, i.e. how many of the keys will be = found but=20 have been focused on 100% hit rate
* we run the query where multiple clients can be spawned to do the = same=20 query cycle 100k keys but the offset is not different so each client = will query=20 the same keys.
* We thought we should manually compact the tables down to 1 = sstable on a=20 given node for consistent results across different cluster sizes
* We had set replication factor to 1 originally to not complicate = things or=20 impact initial write times even.  We would assess rf later was our=20 thought.  Since we changed the keys getting queried it would have = to hit=20 additional nodes to get row data but for just 1 client thread (to get = simplest=20 path to show horizontal scaling, had a slight decrease of performance = when going=20 to 4 nodes from 2 nodes)
 
Things seen off of given procedure and set up:
 
  1. 1 client thread:  2 nodes do better than 1 node on the query=20 test.  But 4 nodes did not do better than 2.
  2. 2 client threads: 2 nodes were still doing better than 1 node=20
  3. 10 client threads: the times drastically suffered and 2 nodes were = doing=20 1/2 the speed of 1 node but before 1 to 2 threads performed better on = 2 nodes=20 vs 1 node.  There was a huge decrease in performance on 2 nodes = and just=20 a mild decrease on 1 node.
Note: 50+ threads was also drastically falling apart.
 
Observations:
  • compacting each node to 1 table did not seem to help as running 10 = client=20 threads on exploded sstables and 2 nodes was 2x better than the last 2 = node 10=20 client test but still decreased performance from 1 to 2 threads query = against=20 compacted tables
  • I would see upwards to 10 read requests pending at times while 8 = to 10=20 were processing when I did nodetool tpstats.
  • having key cache on or disabled did not seem to impact things = noticeably=20 with our current configuration
.
 
Questions:
  1. can multiple threads read the same sstable at the same time?  = Does=20 compacting down to 1 sstable (to get a given row into one sstable) add = any=20 benefit or actually hurt like limited testing has indicated = currently?
  2. given the above testing process, does it still make sense to = adjust=20 replication factor appropriately for cluster size (i.e. 1 for 1 node = cluster,=20 2 for 2 node cluster, 3 for n size cluster).  We assumed it was = just the=20 ability for threads to connect into a coordinator that would help but = sounds=20 like it can still block
 
I'm going to try a limited test with changing = replication=20 factor.  But if anyone has any input on compacting to 1 sstable = benefit or=20 detriment on just simple scalability test, how if at all does cassandra = block on=20 reading sstables, and if higher replication factors do indeed help = produce=20 reliable results it would be appreciated.  I know part of our = charter was=20 keep it simple to produce the scalability proof but it does sound like=20 replication factor is hurting us if the delay between clients for the = same keys=20 is not long enough given the fact we are not doing different offsets for = each=20 client thread. 
 
Thanks,
Diane

On Thu, Jul 17, 2014 at 3:53 AM, Duncan Sands = <duncan.sands@gmail.com> wrote:
Hi=20 Diane,=20


On 17/07/14 06:19, Diane Griffith wrote:
We=20 have been struggling proving out linear read performance with our=20 cassandra
configuration, that it is horizontally scaling.  = Wondering=20 if anyone has any
suggestions for what minimal configuration and = approach=20 to use to demonstrate this.

We were trying to go for a simple = set up,=20 so on the keyspace and/or column
families we went with the = following=20 settings thinking it was the minimal to
prove=20 scaling:

replication_factor set to = 1,

a RF of=20 1 means that any particular bit of data exists on exactly one = node.  So=20 if you are testing read speed by reading the same data item again and = again as=20 fast as you can, then all the reads will be coming from the same one = node, the=20 one that has that data item on it.  In this situation adding more = nodes=20 won't help.  Maybe this isn't exactly how you are testing read = speed, but=20 perhaps you are doing something analogous?  I suggest you explain = how you=20 are measuring read speed exactly.

Ciao, Duncan.

SimpleStrategy,
default consistency level,
default = compaction=20 strategy (size tiered),
but compacted down to 1 sstable per cf on = each=20 node (versus using leveled
compaction for read=20 performance)

*Read Performance Results:*=20

1 client thread - 2 nodes > 1 node was seen but we = couldn't show=20 increased
performance adding more nodes i.e 4 nodes ! > 2 = nodes
2=20 client threads - 2 nodes > 1 node still was true but again we = couldn't=20 show
increased performance adding more nodes i.e. 4 nodes ! > = 2=20 nodes
10 client threads - this time 2 nodes < 1 node on = performance=20 numbers.  2 nodes
suffered from larger reduce throughput = than 1 node=20 was showing.

Where are we going wrong?

How have others = shown=20 horizontal scaling for=20 = reads?

Thanks,
Diane

 
------=_NextPart_000_19ED_01CFA1A6.6AAA6A30--