Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 185A111F86 for ; Fri, 18 Jul 2014 13:02:32 +0000 (UTC) Received: (qmail 97873 invoked by uid 500); 18 Jul 2014 13:02:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97842 invoked by uid 500); 18 Jul 2014 13:02:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97831 invoked by uid 99); 18 Jul 2014 13:02:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jul 2014 13:02:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dfgriffith@gmail.com designates 74.125.82.173 as permitted sender) Received: from [74.125.82.173] (HELO mail-we0-f173.google.com) (74.125.82.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jul 2014 13:02:26 +0000 Received: by mail-we0-f173.google.com with SMTP id q58so4583786wes.18 for ; Fri, 18 Jul 2014 06:02:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+R2KopBMNXCGBh0IlrNnTvxskuY2/9KVP8VttI3Esto=; b=qV3uvWp7YYa3eawykRH4PekQaTn/aepgFIKVUikdemZUWdfGuQqMippTxUp3SWzorZ V4tKfhRUozB5pose0SSkbJCZH96ZtWQouPpb27MR/Fr8Hh05Ax4VlPcP2K5xLWa36MH7 u378WwY1PEy8dBuImPZLwTG+caTmgSiYwD3sZBgtmcTLEnSrk6swEA1Ly6Lozmgyteuq 50bxd6dyjzxGnvpC/aceJyDtZTrVejOFzbZnhpXulqGMMva/mC6b2Dhwykl4+zyE3TE+ OvvF0/peraGlnXpmViQWRmYTYPvY0+/IgIDwZ2rx1dW2DC+LVtWE/0akLFEMwITgr5A/ SnMw== MIME-Version: 1.0 X-Received: by 10.194.134.70 with SMTP id pi6mr6699913wjb.1.1405688519904; Fri, 18 Jul 2014 06:01:59 -0700 (PDT) Received: by 10.216.105.143 with HTTP; Fri, 18 Jul 2014 06:01:59 -0700 (PDT) In-Reply-To: References: <62C7CCE5A334433FBC9F93F87370648F@JackKrupansky14> <43A7F5BFFCEB4231B2D261DB7F61D648@JackKrupansky14> Date: Fri, 18 Jul 2014 09:01:59 -0400 Message-ID: Subject: Re: horizontal query scaling issues follow on From: Diane Griffith To: user Content-Type: multipart/alternative; boundary=089e01175d9de4d30b04fe775c6d X-Virus-Checked: Checked by ClamAV on apache.org --089e01175d9de4d30b04fe775c6d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Working on getting some samples but grabbed the last part of the nodetool cfhistograms for one of the column families on one of the nodes. What does it mean for the partition information: Partition Size (bytes) 1109 bytes: 18000000 Cell Count per Partition 8 cells: 18000000 meaning I can't glean anything about how it partitioned or if it broke a key across partitions from this right? Does it mean for 18000000 (the number of unique keys) that each has 8 cells? Thanks, Diane On Fri, Jul 18, 2014 at 6:46 AM, Benedict Elliott Smith < belliottsmith@datastax.com> wrote: > How many columns are you inserting/querying per key? Could we see some > example CQL statements for the insert/read workload? > > If you are maxing out at 10 clients, something fishy is going on. In > general, though, if you find that adding nodes causes performance to > degrade I would suspect that you are querying data in one CQL statement > that is spread over multiple partitions, and so extra work needs to be do= ne > cross-cluster to service your requests as more nodes are added. > > I would also consider what effect the file cache may be having on your > workload, as it sounds small enough to fit in memory, so is likely a majo= r > determining factor for performance of your benchmark. As you try differen= t > client levels for the smaller cluster you may see improved performance as > the data is pulled into file cache across test runs, and then when you > build your larger cluster this is lost so performance appears to degrade > (for instance). > > > On Fri, Jul 18, 2014 at 12:25 PM, Diane Griffith > wrote: > >> The column family schema is: >> >> CREATE TABLE IF NOT EXISTS foo (key text, col_name text, col_value text, >> PRIMARY KEY(key, col_name)) >> >> where the key is a generated uuid and all keys were inserted in random >> order but in the end we were compacting down to one sstable per node. >> >> So we were doing it this way to achieve dynamic columns. >> >> Thanks, >> Diane >> >> On Fri, Jul 18, 2014 at 12:19 AM, Jack Krupansky > > wrote: >> >>> Sorry I may have confused the discussion by mentioning tokens =E2=80= =93 I >>> wasn=E2=80=99t intending to refer to vnodes or the num_tokens property,= but merely >>> referring to the token range of a node and that the partition key hashe= s to >>> a token value. >>> >>> The main question is what you use for your primary key and whether you >>> are using a small number of partition keys and a large number of cluste= ring >>> columns, or does each row have a unique partition key and no clustering >>> columns. >>> >>> -- Jack Krupansky >>> >>> *From:* Diane Griffith >>> *Sent:* Thursday, July 17, 2014 6:21 PM >>> *To:* user >>> *Subject:* Re: horizontal query scaling issues follow on >>> >>> So do partitions equate to tokens/vnodes? >>> >>> If so we had configured all cluster nodes/vms with num_tokens: 256 >>> instead of setting init_token and assigning ranges. I am still not get= ting >>> why in Cassandra 2.0, I would assign my own ranges via init_token and t= his >>> was based on the documentation and even this blog item >>> that >>> made it seem right for us to always configure our cluster vms with >>> num_tokens: 256 in the cassandra.yaml file. >>> >>> Also in all testing, all vms were of equal sizing so one was not more >>> powerful than another. >>> >>> I didn't think I was hitting an i/o wall on the client vm (separate vm) >>> where we command line scripted our query call to the cassandra cluster. >>> I can break the client call load across vms which I tried early on. Ha= ppy >>> to verify that again though. >>> >>> So given that I was assuming the partitions were such that it wasn't a >>> problem. Is that an incorrect assumption and something to dig into mor= e? >>> >>> Thanks, >>> Diane >>> >>> >>> On Thu, Jul 17, 2014 at 3:01 PM, Jack Krupansky >> > wrote: >>> >>>> How many partitions are you spreading those 18 million rows over? >>>> That many rows in a single partition will not be a sweet spot for >>>> Cassandra. It=E2=80=99s not exceeding any hard limit (2 billion), but = some internal >>>> operations may cache the partition rather than the logical row. >>>> >>>> And all those rows in a single partition would certainly not be a test >>>> of =E2=80=9Chorizontal scaling=E2=80=9D (adding nodes to handle more d= ata =E2=80=93 more token >>>> values or partitions.) >>>> >>>> -- Jack Krupansky >>>> >>>> *From:* Diane Griffith >>>> *Sent:* Thursday, July 17, 2014 1:33 PM >>>> *To:* user >>>> *Subject:* horizontal query scaling issues follow on >>>> >>>> >>>> This is a follow on re-post to clarify what we are trying to do, >>>> providing information that was missing or not clear. >>>> >>>> >>>> >>>> Goal: Verify horizontal scaling for random non duplicating key reads >>>> using the simplest configuration (or minimal configuration) possible. >>>> >>>> >>>> >>>> Background: >>>> >>>> A couple years ago we did similar performance testing with Cassandra >>>> for both read and write performance and found excellent (essentially >>>> linear) horizontal scalability. That project got put on hold. We are= now >>>> moving forward with an operational system and are having scaling probl= ems. >>>> >>>> >>>> >>>> During the prior testing (3 years ago) we were using a much older >>>> version of Cassandra (0.8 or older), the THRIFT API, and Amazon AWS ra= ther >>>> than OpenStack VMs. We are now using the latest Cassandra and the CQL >>>> interface. We did try moving from OpenStack to AWS/EC2 but that did n= ot >>>> materially change our (poor) results. >>>> >>>> >>>> >>>> Test Procedure: >>>> >>>> - Inserted 54 million cells in 18 million rows (so 3 cells per >>>> row), using randomly generated row keys. That was to be our data co= ntrol >>>> for the test. >>>> - Spawn a client on a different VM to query 100k rows and do that >>>> for 100 reps. Each row key queried is drawn randomly from the set = of >>>> existing row keys, and then not re-used, so all 10 million row quer= ies use >>>> a different (valid) row key. This test is a specific use case of o= ur >>>> system we are trying to show will scale >>>> >>>> Result: >>>> >>>> - 2 nodes performed better than 1 node test but 4 nodes showed >>>> decreased performance over 2 nodes. So that did not show horizonta= l scaling >>>> >>>> >>>> >>>> Notes: >>>> >>>> - We have replication factor set to 1 as we were trying to keep the >>>> control test simple to prove out horizontal scaling. >>>> - When we tried to add threading to see if it would help it had >>>> interesting side behavior which did not prove out horizontal scalin= g. >>>> - We are using CQL versus THRIFT API for Cassandra 2.0.6 >>>> >>>> >>>> >>>> >>>> >>>> Does anyone have any feedback that either threading or replication >>>> factor is necessary to show horizontal scaling of Cassandra versus the >>>> minimal way of just continue to add nodes to help throughput? >>>> >>>> >>>> >>>> Any suggestions of minimal configuration necessary to show scaling of >>>> our query use case 100k requests for random non repeating keys constan= tly >>>> coming in over a period of time? >>>> >>>> >>>> Thanks, >>>> >>>> Diane >>>> >>> >>> >> >> > --089e01175d9de4d30b04fe775c6d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Working on getting some samples but grabbed the last part = of the nodetool cfhistograms for one of the column families on one of the n= odes. =C2=A0What does it mean for the partition information:

=
Partition Size (bytes)
1109 bytes: 18000000

Cell Count per Partit= ion
8 cells: 18000000

meaning I can't glean anythi= ng about how it partitioned or if it broke a key across partitions from thi= s right?=C2=A0 Does it mean for 18000000 (the number of unique keys) that e= ach has 8 cells?=C2=A0

Thanks,
Diane


On Fri, Jul 1= 8, 2014 at 6:46 AM, Benedict Elliott Smith <belliottsmith@datasta= x.com> wrote:
How many= columns are you inserting/querying per key? Could we see some example CQL = statements for the insert/read workload?

If you are maxing out at 10 clients, something fishy is goin= g on. In general, though, if you find that adding nodes causes performance = to degrade I would suspect that you are querying data in one CQL statement = that is spread over multiple partitions, and so extra work needs to be done= cross-cluster to service your requests as more nodes are added.=C2=A0

I would also consider what effect the file cache may be havi= ng on your workload, as it sounds small enough to fit in memory, so is like= ly a major determining factor for performance of your benchmark. As you try= different client levels for the smaller cluster you may see improved perfo= rmance as the data is pulled into file cache across test runs, and then whe= n you build your larger cluster this is lost so performance appears to degr= ade (for instance).


On Fri, Jul 18, 2014 at 12:25 PM, Diane Griffith <dfgriffith@g= mail.com> wrote:
The= column family schema is:

CREATE TABLE IF NOT EXISTS foo (key text, = col_name text, col_value text, PRIMARY KEY(key, col_name))

where the key is a generated uuid and all keys were inserted in r= andom order but in the end we were compacting down to one sstable per node.=

S= o we were doing it this way to achieve dynamic columns.=C2=A0

Thanks,
Dia= ne

On Fri, Jul 18, 2014 at 12:19 AM, Jack Krupansky <j= ack@basetechnology.com> wrote:
Sorry I may have confused the discussion by mentioning tokens =E2=80= =93 I wasn=E2=80=99t=20 intending to refer to vnodes or the num_tokens property, but merely referri= ng to=20 the token range of a node and that the partition key hashes to a token=20 value.
=C2=A0
The main question is what you use for your primary key and whether you= are=20 using a small number of partition keys and a large number of clustering col= umns,=20 or does each row have a unique partition key and no clustering columns.
=C2=A0
-- Jack=20 Krupansky
=C2=A0
Sent: Thursday, July 17, 2014 6:21 PM
To: user
Subject: Re: horizontal query scaling issues follow=20 on
=C2=A0
So do partitions equate to tokens/vnodes?=20
=C2=A0
If so we had configured all cluster nodes/vms with num_tokens: 256 ins= tead=20 of setting init_token and assigning ranges.=C2=A0 I am still not getting wh= y in=20 Cassandra 2.0, I would assign my own ranges via init_token and this was bas= ed on=20 the documentation and even this blog=20 item that made it seem right for us to always configure our cluster vms= with=20 num_tokens: 256 in the cassandra.yaml file.=C2=A0
=C2=A0
Also in all testing, all vms were of equal sizing so one was not more= =20 powerful than another.=C2=A0
=C2=A0
I didn't think I was hitting an i/o wall on the client vm (separat= e vm)=20 where we command line scripted our query call to the cassandra=20 cluster.=C2=A0=C2=A0=C2=A0 I can break the client call load across vms whic= h I=20 tried early on.=C2=A0 Happy to verify that again though.
=C2=A0
So given that I was assuming the partitions were such that it wasn'= ;t a=20 problem.=C2=A0 Is that an incorrect assumption and something to dig into=20 more?
=C2=A0
Thanks,
Diane


On Thu, Jul 17, 2014 at 3:01 PM, Jack Krupansky = <jack@basetechnology.com> wrote:
How many partitions are you spreading those 18 million rows over? Th= at=20 many rows in a single partition will not be a sweet spot for Cassandra. I= t=E2=80=99s=20 not exceeding any hard limit (2 billion), but some internal operations ma= y=20 cache the partition rather than the logical row.
=C2=A0
And all those rows in a single partition would certainly not be a te= st of=20 =E2=80=9Chorizontal scaling=E2=80=9D (adding nodes to handle more data = =E2=80=93 more token values or=20 partitions.)
=C2=A0
-- Jack=20 Krupansky
=C2=A0
Sent: Thursday, July 17, 2014 1:33 PM
To: user
Subject: horizontal query scaling issues follow=20 on
=C2=A0

This is a follow on=20 re-post to clarify what we are trying to do, providing information that w= as=20 missing or not clear.

=C2=A0

Goal:=C2=A0 Verify=20 horizontal scaling for random non duplicating key reads using the simples= t=20 configuration (or minimal configuration) possible.

=C2=A0

Background:

A couple years ago we=20 did similar performance testing with Cassandra for both read and write=20 performance and found excellent (essentially linear) horizontal=20 scalability.=C2=A0 That project got put on hold.=C2=A0 We are now moving= =20 forward with an operational system and are having scaling problems.

=C2=A0

During the prior=20 testing (3 years ago) we were using a much older version of Cassandra (0.= 8 or=20 older), the THRIFT API, and Amazon AWS rather than OpenStack VMs.=C2=A0 W= e are=20 now using the latest Cassandra and the CQL interface.=C2=A0 We did try mo= ving=20 from OpenStack to AWS/EC2 but that did not materially change our (poor)= =20 results.

=C2=A0

Test=20 Procedure:

  • Inserted 54 million=20 cells in 18 million rows (so 3 cells per row), using randomly generated= row=20 keys. That was to be our data control for the test.=20
  • Spawn a client on a=20 different VM to query 100k rows and do that for 100 reps.=C2=A0 Each ro= w key=20 queried is drawn randomly from the set of existing row keys, and then n= ot=20 re-used, so all 10 million row queries use a different (valid) row=20 key.=C2=A0 This test is a specific use case of our system we are trying= to=20 show will scale

Result:

  • 2 nodes performed=20 better than 1 node test but 4 nodes showed decreased performance over 2= =20 nodes.=C2=A0 So that did not show horizontal scaling

=C2=A0

Notes:

  • We have replication=20 factor set to 1 as we were trying to keep the control test simple to pr= ove=20 out horizontal scaling.=C2=A0
  • When we tried to add=20 threading to see if it would help it had interesting side behavior whic= h did=20 not prove out horizontal scaling.=20
  • We are using CQL=20 versus THRIFT API for Cassandra 2.0.6

=C2=A0

=C2=A0

Does anyone have any=20 feedback that either threading or replication factor is necessary to show= =20 horizontal scaling of Cassandra versus the minimal way of just continue t= o add=20 nodes to help throughput?

=C2=A0

Any suggestions of=20 minimal configuration necessary to show scaling of our query use case 100= k=20 requests for random non repeating keys constantly coming in over a period= of=20 time?


Thanks,

Diane

=C2=A0



--089e01175d9de4d30b04fe775c6d--