Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A96F100DC for ; Tue, 26 Nov 2013 01:59:05 +0000 (UTC) Received: (qmail 40368 invoked by uid 500); 26 Nov 2013 01:59:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 40341 invoked by uid 500); 26 Nov 2013 01:59:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40333 invoked by uid 99); 26 Nov 2013 01:59:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Nov 2013 01:59:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of abarua@247-inc.com designates 213.199.154.15 as permitted sender) Received: from [213.199.154.15] (HELO emea01-am1-obe.outbound.protection.outlook.com) (213.199.154.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Nov 2013 01:58:57 +0000 Received: from SINPR03MB139.apcprd03.prod.outlook.com (10.242.53.22) by SINPR03MB140.apcprd03.prod.outlook.com (10.242.53.27) with Microsoft SMTP Server (TLS) id 15.0.820.5; Tue, 26 Nov 2013 01:58:30 +0000 Received: from SINPR03MB139.apcprd03.prod.outlook.com ([169.254.12.121]) by SINPR03MB139.apcprd03.prod.outlook.com ([169.254.12.121]) with mapi id 15.00.0820.005; Tue, 26 Nov 2013 01:58:30 +0000 From: Arindam Barua To: "user@cassandra.apache.org" Subject: RE: Config changes to leverage new hardware Thread-Topic: Config changes to leverage new hardware Thread-Index: Ac7bRPlbHTuTjZ9YTeuV7fShtMd93wALZC+AAoI9B+ABJy8LAAAFQLgw Date: Tue, 26 Nov 2013 01:58:29 +0000 Message-ID: <482ae1cac1ed4162a844aa552be59939@SINPR03MB139.apcprd03.prod.outlook.com> References: <218c9fe89d68457db021e5559febc6db@SINPR03MB139.apcprd03.prod.outlook.com> <2224FB9C-6480-4577-9B24-FEA3ED4C50F8@thelastpickle.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [66.170.122.249] x-forefront-prvs: 00429279BA x-forefront-antispam-report: SFV:NSPM;SFS:(189002)(199002)(377454003)(479174003)(51914003)(24454002)(164054003)(13464003)(51704005)(4396001)(87266001)(80976001)(47736001)(50986001)(47976001)(49866001)(19580395003)(2656002)(83322001)(19580405001)(33646001)(87936001)(74876001)(74706001)(77096001)(56816003)(76576001)(76786001)(76796001)(16601075003)(16236675002)(15202345003)(83072001)(19300405004)(85306002)(63696002)(81542001)(53806001)(79102001)(69226001)(54356001)(19609705001)(15975445006)(81342001)(66066001)(74662001)(47446002)(31966008)(65816001)(56776001)(74316001)(54316002)(59766001)(46102001)(81816001)(74366001)(51856001)(81686001)(77982001)(24736002);DIR:OUT;SFP:;SCL:1;SRVR:SINPR03MB140;H:SINPR03MB139.apcprd03.prod.outlook.com;CLIP:66.170.122.249;FPR:;RD:InfoNoRecords;MX:3;A:1;LANG:en; Content-Type: multipart/alternative; boundary="_000_482ae1cac1ed4162a844aa552be59939SINPR03MB139apcprd03pro_" MIME-Version: 1.0 X-OriginatorOrg: 247-inc.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_482ae1cac1ed4162a844aa552be59939SINPR03MB139apcprd03pro_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Here are some calculated 'latency' results reported by cassandra-stress whe= n asked to write 10M rows, i.e. cassandra-stress -d , -n 10000000 (we actually had cassandra-stress running in deamon mode for the below test= s) avg_latency (percentile) 90 99 99.9 99.99 Write: 8 cores, 32 GB, 3-disk RAID 0 0.002982182 0.003963931 0.004692996 0.004792326 Write: 32 cores, 128 GB, 7-disk RAID 0 0.003157515 0.003763181 0.005184429 0.005441946 Read: 8 cores, 32 GB, 3-disk RAID 0 0.002289879 0.057178021 0.173753058 0.24386912 Read: 32 cores, 128 GB, 7-disk RAID 0 0.002317525 0.010937648 0.013205977 0.014270511 The client was another node on the same network with the 8 core, 32 GB RAM = specs. I wouldn't expect it to bottleneck, but I can monitor it while gener= ating the load. In general, what would you expect it to bottleneck at? >> Another interesting thing is that the linux disk cache doesn't seem to b= e growing in spite of a lot of free memory available. >Things will only get paged in when they are accessed. Hmm, interesting. I did a test where I just wrote large files to disk, eg. dd if=3D/dev/zero of=3Dbigfile18 bs=3D1M count=3D10000 and checked the disk cache, and it increased by exactly the same size of th= e file written (no reads were done in this case) -----Original Message----- From: Aaron Morton [mailto:aaron@thelastpickle.com] Sent: Monday, November 25, 2013 11:55 AM To: Cassandra User Subject: Re: Config changes to leverage new hardware > However, for both writes and reads there was virtually no difference in t= he latencies. What sort of latency were you getting ? > I'm still not very sure where the current *write* bottleneck is though. What numbers are you getting ? Could the bottle neck be the client ? Can it send writes fast enough to sat= urate the nodes ? As a rule of thumb you should get 3,000 to 4,000 (non counter) writes per s= econd per core. > Sample iostat data (captured every 10s) for the dedicated disk where comm= it logs are written is below. Does this seem like a bottle neck? Does not look too bad. > Another interesting thing is that the linux disk cache doesn't seem to be= growing in spite of a lot of free memory available. Things will only get paged in when they are accessed. Cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 21/11/2013, at 12:42 pm, Arindam Barua > wrote: > > Thanks for the suggestions Aaron. > > As a follow up, we ran a bunch of tests with different combinations of th= ese changes on a 2-node ring. The load was generated using cassandra-stress= , run with default values to write 30 million rows, and read them back. > However, for both writes and reads there was virtually no difference in t= he latencies. > > The different combinations attempted: > 1. Baseline test with none of the below changes. > 2. Grabbing the TLAB setting from 1.2 > 3. Moving the commit logs too to the 7 disk RAID 0. > 4. Increasing the concurrent_read to 32, and concurrent_write to 64 > 5. (3) + (4), i.e. moving commit logs to the RAID + increasing conc= urrent_read and concurrent_write config to 32 and 64. > > The write latencies were very similar, except them being ~3x worse for th= e 99.9th percentile and above for scenario (5) above. > The read latencies were also similar, with (3) and (5) being a little wor= se for the 99.99th percentile. > > Overall, not making any changes, i.e. (1) performed as well or slightly b= etter than any of the other changes. > > Running cassandra-stress on both the old and new hardware without making = any config changes, the write performance was very similar, but the new har= dware did show ~10x improvement in the read for the 99.9th percentile and h= igher. After thinking about this, the reason why we were not seeing any dif= ference with our test framework was perhaps the nature of the test where we= write the rows, and then do a bunch of reads to read the rows that were ju= st written immediately following. The data is read back from the memtables,= and never from the disk/sstables. Hence the new hardware's increased RAM a= nd size of the disk cache or higher number of disks never helps. > > I'm still not very sure where the current *write* bottleneck is though. T= he new hardware has 32 cores vs 8 cores of the old hardware. Moving the com= mit log from a dedicated disk to a 7 RAID-0 disk system (where it would be = shared by other data though) didn't make a difference too. (unless the extr= a contention on the RAID nullified the positive effects of the RAID). > > Sample iostat data (captured every 10s) for the dedicated disk where comm= it logs are written is below. Does this seem like a bottle neck? When the c= ommit logs are written the await/svctm ratio is high. > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz av= gqu-sz await svctm %util > 0.00 8.09 0.04 8.85 0.00 0.07 15.74 0= .00 0.12 0.03 0.02 > 0.00 768.03 0.00 9.49 0.00 3.04 655.41 0= .04 4.52 0.33 0.31 > 0.00 8.10 0.04 8.85 0.00 0.07 15.75 0= .00 0.12 0.03 0.02 > 0.00 752.65 0.00 10.09 0.00 2.98 604.75 0= .03 3.00 0.26 0.26 > > Another interesting thing is that the linux disk cache doesn't seem to be= growing in spite of a lot of free memory available. The total disk cache u= sed reported by 'free' is less than the size of the sstables written with o= ver 100 GB unused RAM. > Even in production, where we have the older hardware running with 32 GB R= AM for a long time now, looking at 5 hosts in 1 DC, only 2.5 GB to 8 GB was= used for the disk cache. The Cassandra java process uses the 8 GB allocate= d to it, and at least 10-15 GB on all the hosts is not used at all. > > Thanks, > Arindam > > From: Aaron Morton [mailto:aaron@thelastpickle.com] > Sent: Wednesday, November 06, 2013 8:34 PM > To: Cassandra User > Subject: Re: Config changes to leverage new hardware > > Running Cassandra 1.1.5 currently, but evaluating to upgrade to 1.2.11 so= on. > You will make more use of the extra memory moving to 1.2 as it moves bloo= m filters and compression data off heap. > > Also grab the TLAB setting from cassandra-env.sh in v1.2 > > As of now, our performance tests (our application specific as well as cas= sandra-stress) are not showing any significant difference in the hardwares,= which is a little disheartening, since the new hardware has a lot more RAM= and CPU. > For reads or writes or both ? > > Writes tend to scale with cores as long as the commit log can keep up. > Reads improve with disk IO and page cache size when the hot set is in mem= ory. > > Old Hardware: 8 cores (2 quad core), 32 GB RAM, four 1-TB disks ( 1 > disk used for commitlog and 3 disks RAID 0 for data) New Hardware: 32 > cores (2 8-core with hyperthreading), 128 GB RAM, eight 1-TB disks ( 1 di= sk used for commitlog and 7 disks RAID 0 for data) Is the disk IO on the co= mmit log volume keeping up ? > You cranked up the concurrent writers and the commit log may not keep up.= You could put the commit log on the same RAID volume to see if that improv= es writes. > > The config we tried modifying so far was concurrent_reads to (16 * > number of drives) and concurrent_writes to (8 * number of cores) as > per > 256 write threads is a lot. Make sure the commit log can keep up, I would= put it back to 32, maybe try 64. Not sure the concurrent list for the comm= it log will work well with that many threads. > > May want to put the reads down as well. > > It's easier to tune the system if you can provide some info on the worklo= ad. > > Cheers > > ----------------- > Aaron Morton > New Zealand > @aaronmorton > > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > On 7/11/2013, at 12:35 pm, Arindam Barua > wrote: > > > > We want to upgrade our Cassandra cluster to have newer hardware, and were= wondering if anyone has suggestions on Cassandra or linux config changes t= hat will prove to be beneficial. > As of now, our performance tests (our application specific as well as cas= sandra-stress) are not showing any significant difference in the hardwares,= which is a little disheartening, since the new hardware has a lot more RAM= and CPU. > > Old Hardware: 8 cores (2 quad core), 32 GB RAM, four 1-TB disks ( 1 > disk used for commitlog and 3 disks RAID 0 for data) New Hardware: 32 > cores (2 8-core with hyperthreading), 128 GB RAM, eight 1-TB disks ( 1 > disk used for commitlog and 7 disks RAID 0 for data) > > Most of the cassandra config currently is the default, and we are using L= eveledCompaction strategy. Default key cache, row cache turned off. > The config we tried modifying so far was concurrent_reads to (16 * number= of drives) and concurrent_writes to (8 * number of cores) as per recommend= ation in cassandra.yaml, but that didn't make much difference. > We were hoping that at least the extra RAM in the new hardware will be us= ed for Linux file caching and hence an improvement in performance will be o= bserved. > > Running Cassandra 1.1.5 currently, but evaluating to upgrade to 1.2.11 so= on. > > Thanks, > Arindam --_000_482ae1cac1ed4162a844aa552be59939SINPR03MB139apcprd03pro_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

 

Here are some calculated ‘late= ncy’ results reported by cassandra-stress when asked to write 10M row= s, i.e.

cassandra-stress -d <ip1&g= t;,<ip2> -n 10000000

(we actually had cassandra-stress ru= nning in deamon mode for the below tests)

 

avg_latency

(percentile)

90

99

99.9

99.99

Write: 8 cores, 32 GB, 3-d= isk RAID 0

0.002982182

0.003963931

0.004692996

0.004792326

Write: 32 cores, 128 GB, 7= -disk RAID 0

0.003157515

0.003763181

0.005184429

0.005441946

 

Read: 8 cores, 32 GB, 3-disk RAID 0=

0.002289879

0.057178021

0.173753058

0.24386912

Read: 32 cores, 128 GB, 7-disk RAID 0

0.002317525

0.010937648

0.013205977

0.014270511

 

The client was another node on the same network w= ith the 8 core, 32 GB RAM specs. I wouldn’t expect it to bottleneck, = but I can monitor it while generating the load. In general, what would you = expect it to bottleneck at?

 

>> Another interesting thing is that the li= nux disk cache doesn’t seem to be growing in spite of a lot of free m= emory available.

>Things will only get paged in when they are a= ccessed.

Hmm, interesting. I did a test where I just wrote= large files to disk, eg.

dd if=3D/dev/zero of=3Dbigfile18 bs=3D1M count=3D= 10000

and checked the disk cache, and it increased by e= xactly the same size of the file written (no reads were done in this case)<= o:p>

 

-----Original Message-----
From: Aaron Morton [mailto:aaron@thelastpickle.com]
Sent: Monday, November 25, 2013 11:55 AM
To: Cassandra User
Subject: Re: Config changes to leverage new hardware

 

> However, for both writes and reads there was= virtually no difference in the latencies.

What sort of latency were you getting ?

 

> I’m still not very sure where the curr= ent *write* bottleneck is though.

What numbers are you getting ?

Could the bottle neck be the client ? Can it send= writes fast enough to saturate the nodes ?

 

As a rule of thumb you should get 3,000 to 4,000 = (non counter) writes per second per core.

 

> Sample iostat data (captured every 10s) for = the dedicated disk where commit logs are written is below. Does this seem l= ike a bottle neck?

Does not look too bad.

 

> Another interesting thing is that the linux = disk cache doesn’t seem to be growing in spite of a lot of free memor= y available.

Things will only get paged in when they are acces= sed.

 

Cheers

 

 

-----------------

Aaron Morton

New Zealand

@aaronmorton

 

Co-Founder & Principal Consultant<= /p>

Apache Cassandra Consulting

http://www.thelastpickle.com<= /span>

 

On 21/11/2013, at 12:42 pm, Arindam Barua <abarua@247-inc.com> wrote:

 

> Thanks for the suggestions Aaron.=

> As a follow up, we ran a bunch of tests with= different combinations of these changes on a 2-node ring. The load was gen= erated using cassandra-stress, run with default values to write 30 million = rows, and read them back.

> However, for both writes and reads there was= virtually no difference in the latencies.

> The different combinations attempted:

> 1.       Basel= ine test with none of the below changes.

> 2.       Grabb= ing the TLAB setting from 1.2

> 3.       Movin= g the commit logs too to the 7 disk RAID 0.

> 4.       Incre= asing the concurrent_read to 32, and concurrent_write to 64

> 5.       (3) &= #43; (4), i.e. moving commit logs to the RAID + increasing concurrent_r= ead and concurrent_write config to 32 and 64.

> The write latencies were very similar, excep= t them being ~3x worse for the 99.9th percentile and above for scenario (5)= above.

> The read latencies were also similar, with (= 3) and (5) being a little worse for the 99.99th percentile.

> Overall, not making any changes, i.e. (1) pe= rformed as well or slightly better than any of the other changes.

> Running cassandra-stress on both the old and= new hardware without making any config changes, the write performance was = very similar, but the new hardware did show ~10x improvement in the read fo= r the 99.9th percentile and higher. After thinking about this, the reason why we were not seeing any difference with= our test framework was perhaps the nature of the test where we write the r= ows, and then do a bunch of reads to read the rows that were just written i= mmediately following. The data is read back from the memtables, and never from the disk/sstables. Hence the = new hardware’s increased RAM and size of the disk cache or higher num= ber of disks never helps.

> I’m still not very sure where the curr= ent *write* bottleneck is though. The new hardware has 32 cores vs 8 cores = of the old hardware. Moving the commit log from a dedicated disk to a 7 RAI= D-0 disk system (where it would be shared by other data though) didn’t make a difference too. (unless the extra c= ontention on the RAID nullified the positive effects of the RAID).

> Sample iostat data (captured every 10s) for = the dedicated disk where commit logs are written is below. Does this seem l= ike a bottle neck? When the commit logs are written the await/svctm ratio i= s high.

> Device:      &= nbsp;  rrqm/s   wrqm/s   r/s   w/s =    rMB/s    wMB/s avgrq-sz avgqu-sz   aw= ait  svctm  %util

>       &nb= sp;        0.00     = 8.09  0.04  8.85     0.00   &n= bsp; 0.07    15.74     0.00  &= nbsp; 0.12   0.03   0.02

>       &nb= sp;        0.00   768.03 = 0.00  9.49     0.00     3.04&= nbsp;  655.41     0.04    4.52 = ;  0.33   0.31

>       &nb= sp;        0.00     = 8.10  0.04  8.85     0.00    &= nbsp;0.07    15.75     0.00  &= nbsp; 0.12   0.03   0.02

>       &nb= sp;        0.00   752.65 = 0.00 10.09     0.00     2.98 =   604.75     0.03    3.00 &nbs= p; 0.26   0.26

> Another interesting thing is that the linux = disk cache doesn’t seem to be growing in spite of a lot of free memor= y available. The total disk cache used reported by ‘free’ is le= ss than the size of the sstables written with over 100 GB unused RAM.

> Even in production, where we have the older = hardware running with 32 GB RAM for a long time now, looking at 5 hosts in = 1 DC, only 2.5 GB to 8 GB was used for the disk cache. The Cassandra java p= rocess uses the 8 GB allocated to it, and at least 10-15 GB on all the hosts is not used at all.

> Thanks,

> Arindam

> From: Aaron Morton [ma= ilto:aaron@thelastpickle.com]

> Sent: Wednesday, November 06, 2013 8:34 PM

> To: Cassandra User

> Subject: Re: Config changes to leverage new = hardware

> Running Cassandra 1.1.5 currently, but evalu= ating to upgrade to 1.2.11 soon.

> You will make more use of the extra memory m= oving to 1.2 as it moves bloom filters and compression data off heap.

> Also grab the TLAB setting from cassandra-en= v.sh in v1.2

> As of now, our performance tests (our applic= ation specific as well as cassandra-stress) are not showing any significant= difference in the hardwares, which is a little disheartening, since the ne= w hardware has a lot more RAM and CPU.

> For reads or writes or both ?

> Writes tend to scale with cores as long as t= he commit log can keep up.

> Reads improve with disk IO and page cache si= ze when the hot set is in memory.

> Old Hardware: 8 cores (2 quad core), 32 GB R= AM, four 1-TB disks ( 1

> disk used for commitlog and 3 disks RAID 0 f= or data) New Hardware: 32

> cores (2 8-core with hyperthreading), 128 GB= RAM, eight 1-TB disks ( 1 disk used for commitlog and 7 disks RAID 0 for d= ata) Is the disk IO on the commit log volume keeping up ?

> You cranked up the concurrent writers and th= e commit log may not keep up. You could put the commit log on the same RAID= volume to see if that improves writes.

> The config we tried modifying so far was con= current_reads to (16 *

> number of drives) and concurrent_writes to (= 8 * number of cores) as

> per

> 256 write threads is a lot. Make sure the co= mmit log can keep up, I would put it back to 32, maybe try 64. Not sure the= concurrent list for the commit log will work well with that many threads.

> May want to put the reads down as well.

> It’s easier to tune the system if you = can provide some info on the workload.

> Cheers

> -----------------

> Aaron Morton

> New Zealand

> @aaronmorton

> Co-Founder & Principal Consultant

> Apache Cassandra Consulting

> http://www.thelastpickle= .com

> On 7/11/2013, at 12:35 pm, Arindam Barua <= ;abarua@247-inc.com> wrote:

>

>

> We want to upgrade our Cassandra cluster to = have newer hardware, and were wondering if anyone has suggestions on Cassan= dra or linux config changes that will prove to be beneficial.

> As of now, our performance tests (our applic= ation specific as well as cassandra-stress) are not showing any significant= difference in the hardwares, which is a little disheartening, since the ne= w hardware has a lot more RAM and CPU.

> Old Hardware: 8 cores (2 quad core), 32 GB R= AM, four 1-TB disks ( 1

> disk used for commitlog and 3 disks RAID 0 f= or data) New Hardware: 32

> cores (2 8-core with hyperthreading), 128 GB= RAM, eight 1-TB disks ( 1

> disk used for commitlog and 7 disks RAID 0 f= or data)

> Most of the cassandra config currently is th= e default, and we are using LeveledCompaction strategy. Default key cache, = row cache turned off.

> The config we tried modifying so far was con= current_reads to (16 * number of drives) and concurrent_writes to (8 * numb= er of cores) as per recommendation in cassandra.yaml, but that didn’t= make much difference.

> We were hoping that at least the extra RAM i= n the new hardware will be used for Linux file caching and hence an improve= ment in performance will be observed.

> Running Cassandra 1.1.5 currently, but evalu= ating to upgrade to 1.2.11 soon.

> Thanks,

> Arindam

 

--_000_482ae1cac1ed4162a844aa552be59939SINPR03MB139apcprd03pro_--