Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EBABD5F8 for ; Tue, 23 Oct 2012 07:30:45 +0000 (UTC) Received: (qmail 6600 invoked by uid 500); 23 Oct 2012 07:30:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 6342 invoked by uid 500); 23 Oct 2012 07:30:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 6293 invoked by uid 99); 23 Oct 2012 07:30:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Oct 2012 07:30:39 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a94.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Oct 2012 07:30:32 +0000 Received: from homiemail-a94.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a94.g.dreamhost.com (Postfix) with ESMTP id 2F8AC38A072 for ; Tue, 23 Oct 2012 00:30:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=53DYOSH8Q64BHdBGtTAiRKckwB U=; b=I/3WWgbnv6u2QLSqpawaknk/bOWsulb5dWHle9AXwWwvOEXxdDKUKPhBD0 rExKG0gR/R03BkhNgXcH01apk0WSBjGchwaoz7L5lYEl1sTjIRm55+AykDrkjVE5 GVzxqXFQFnCWQIJyaprbWZ1+DIyo1E7mNE39+ny6d9/PzNhI8= Received: from [172.16.1.10] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a94.g.dreamhost.com (Postfix) with ESMTPSA id 626F538A071 for ; Tue, 23 Oct 2012 00:30:10 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_F7B5ACA4-E757-46F1-A746-574448282C29" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Subject: Re: tuning for read performance Date: Tue, 23 Oct 2012 20:30:09 +1300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1498) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_F7B5ACA4-E757-46F1-A746-574448282C29 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 >> and nodetool tpstats always shows pending tasks in the ReadStage. Are clients reading a single row at a time or multiple rows ? Each row = requested in a multi get becomes a task in the read stage.=20 Also look at the type of query you are sending. I talked a little about = the performance of different query techniques at Cassandra = SFhttp://www.datastax.com/events/cassandrasummit2012/presentations =20 > 1. Consider Leveled compaction instead of Size Tiered. LCS improves > read performance at the cost of more writes. I would look at other options first.=20 If you want to know how many SSTables a read is hitting look at nodetool = cfhistograms > 2. You said "skinny column family" which I took to mean not a lot of > columns/row. See if you can organize your data into wider rows which > allow reading fewer rows and thus fewer queries/disk seeks. Wide rows take longer to read than narrow ones. Artificially wide rows = may take longer to read than narrow ones.=20 > 4. Splitting your data from your MetaData could definitely help. I > like separating my read heavy from write heavy CF's because generally > speaking they benefit from different compaction methods. But don't go > crazy creating 1000's of CF's either. +1 25 ms read latency is high.=20 Hope that helps.=20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/10/2012, at 9:06 AM, Aaron Turner wrote: > On Mon, Oct 22, 2012 at 11:05 AM, feedly team = wrote: >> Hi, >> I have a small 2 node cassandra cluster that seems to be = constrained by >> read throughput. There are about 100 writes/s and 60 reads/s mostly = against >> a skinny column family. Here's the cfstats for that family: >>=20 >> SSTable count: 13 >> Space used (live): 231920026568 >> Space used (total): 231920026568 >> Number of Keys (estimate): 356899200 >> Memtable Columns Count: 1385568 >> Memtable Data Size: 359155691 >> Memtable Switch Count: 26 >> Read Count: 40705879 >> Read Latency: 25.010 ms. >> Write Count: 9680958 >> Write Latency: 0.036 ms. >> Pending Tasks: 0 >> Bloom Filter False Postives: 28380 >> Bloom Filter False Ratio: 0.00360 >> Bloom Filter Space Used: 874173664 >> Compacted row minimum size: 61 >> Compacted row maximum size: 152321 >> Compacted row mean size: 1445 >>=20 >> iostat shows almost no write activity, here's a typical line: >>=20 >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s = avgrq-sz >> avgqu-sz await svctm %util >> sdb 0.00 0.00 312.87 0.00 6.61 0.00 = 43.27 >> 23.35 105.06 2.28 71.19 >>=20 >> and nodetool tpstats always shows pending tasks in the ReadStage. The = data >> set has grown beyond physical memory (250GB/node w/64GB of RAM) so I = know >> disk access is required, but are there particular settings I should >> experiment with that could help relieve some read i/o pressure? I = already >> put memcached in front of cassandra so the row cache probably won't = help >> much. >>=20 >> Also this column family stores smallish documents (usually 1-100K) = along >> with metadata. The document is only occasionally accessed, usually = only the >> metadata is read/written. Would splitting out the document into a = separate >> column family help? >>=20 >=20 > Some un-expert advice: >=20 > 1. Consider Leveled compaction instead of Size Tiered. LCS improves > read performance at the cost of more writes. >=20 > 2. You said "skinny column family" which I took to mean not a lot of > columns/row. See if you can organize your data into wider rows which > allow reading fewer rows and thus fewer queries/disk seeks. >=20 > 3. Enable compression if you haven't already. >=20 > 4. Splitting your data from your MetaData could definitely help. I > like separating my read heavy from write heavy CF's because generally > speaking they benefit from different compaction methods. But don't go > crazy creating 1000's of CF's either. >=20 > Hope that gives you some ideas to investigate further! >=20 >=20 > --=20 > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix = & Windows > Those who would give up essential Liberty, to purchase a little = temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero" --Apple-Mail=_F7B5ACA4-E757-46F1-A746-574448282C29 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
and nodetool = tpstats always shows pending tasks in the = ReadStage.
Are clients reading a single row at = a time or multiple rows ? Each row requested in a multi get becomes a = task in the read stage. 

Also look at the type = of query you are sending. I talked a little about the performance of = different query techniques at Cassandra SF= http://www.datastax.com/events/cassandrasummit2012/presentations
=

 
1. Consider = Leveled compaction instead of Size Tiered.  LCS improves
read = performance at the cost of more writes.
I would look at = other options first. 
If you want to know how many = SSTables a read is hitting look at nodetool = cfhistograms

2. You = said "skinny column family" which I took to mean not a lot = of
columns/row.  See if you can organize your data into wider = rows which
allow reading fewer rows and thus fewer queries/disk = seeks.
Wide rows take longer to read than narrow = ones. Artificially wide rows may take longer to read than narrow = ones. 


4. Splitting your data from your MetaData could definitely = help.  I
like separating my read heavy from write heavy CF's = because generally
speaking they benefit from different compaction = methods.  But don't go
crazy creating 1000's of CF's = either.
+1
25 ms read latency is = high. 

Hope that = helps. 

http://www.thelastpickle.com

On 23/10/2012, at 9:06 AM, Aaron Turner <synfinatic@gmail.com> = wrote:

On Mon, Oct 22, 2012 at 11:05 AM, feedly team <feedlydev@gmail.com> = wrote:
Hi,
   I have a = small 2 node cassandra cluster that seems to be constrained by
read = throughput. There are about 100 writes/s and 60 reads/s mostly = against
a skinny column family. Here's the cfstats for that = family:

SSTable count: 13
Space used (live): = 231920026568
Space used (total): 231920026568
Number of Keys = (estimate): 356899200
Memtable Columns Count: 1385568
Memtable = Data Size: 359155691
Memtable Switch Count: 26
Read Count: = 40705879
Read Latency: 25.010 ms.
Write Count: 9680958
Write = Latency: 0.036 ms.
Pending Tasks: 0
Bloom Filter False Postives: = 28380
Bloom Filter False Ratio: 0.00360
Bloom Filter Space Used: = 874173664
Compacted row minimum size: 61
Compacted row maximum = size: 152321
Compacted row mean size: 1445

iostat shows = almost no write activity, here's a typical line:

Device: =         rrqm/s =   wrqm/s     r/s =     w/s    rMB/s =    wMB/s avgrq-sz
avgqu-sz   await =  svctm  %util
sdb =             &n= bsp; 0.00     0.00  312.87 =    0.00     6.61 =     0.00    43.27
23.35 =  105.06   2.28  71.19

and nodetool tpstats = always shows pending tasks in the ReadStage. The data
set has grown = beyond physical memory (250GB/node w/64GB of RAM) so I know
disk = access is required, but are there particular settings I = should
experiment with that could help relieve some read i/o = pressure? I already
put memcached in front of cassandra so the row = cache probably won't help
much.

Also this column family stores = smallish documents (usually 1-100K) along
with metadata. The document = is only occasionally accessed, usually only the
metadata is = read/written. Would splitting out the document into a separate
column = family help?


Some un-expert advice:

1. = Consider Leveled compaction instead of Size Tiered.  LCS = improves
read performance at the cost of more writes.

2. You = said "skinny column family" which I took to mean not a lot = of
columns/row.  See if you can organize your data into wider = rows which
allow reading fewer rows and thus fewer queries/disk = seeks.

3. Enable compression if you haven't already.

4. = Splitting your data from your MetaData could definitely help. =  I
like separating my read heavy from write heavy CF's because = generally
speaking they benefit from different compaction methods. =  But don't go
crazy creating 1000's of CF's either.

Hope = that gives you some ideas to investigate further!


-- =
Aaron Turner
http://synfin.net/ =         Twitter: = @synfinatic
http://tcpreplay.synfin.net/ - = Pcap editing and replay tools for Unix & Windows
Those who would = give up essential Liberty, to purchase a little temporary
Safety, = deserve neither Liberty nor Safety.
   -- Benjamin = Franklin
"carpe diem quam minimum credula = postero"

= --Apple-Mail=_F7B5ACA4-E757-46F1-A746-574448282C29--