Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: aaron morton <aaron@thelastpickle.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_F7B5ACA4-E757-46F1-A746-574448282C29"
Message-Id: <DB807BBC-D151-480A-B44C-8F43BECC75D7@thelastpickle.com>
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Subject: Re: tuning for read performance
Date: Tue, 23 Oct 2012 20:30:09 +1300
References: 
 <CAGNi5tk+uoY-3jLROB5eYQM2c2Wz0EDZsCuDASoEjNCd1QS6EA@mail.gmail.com>
 <CANAZdzWVT5j6bHM8FZrzEYn61jKNHaR1bViJ1nRuZGa8k-we+g@mail.gmail.com>
To: user@cassandra.apache.org
In-Reply-To: 
 <CANAZdzWVT5j6bHM8FZrzEYn61jKNHaR1bViJ1nRuZGa8k-we+g@mail.gmail.com>


--Apple-Mail=_F7B5ACA4-E757-46F1-A746-574448282C29
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

>> and nodetool tpstats always shows pending tasks in the ReadStage.
Are clients reading a single row at a time or multiple rows ? Each row =
requested in a multi get becomes a task in the read stage.=20

Also look at the type of query you are sending. I talked a little about =
the performance of different query techniques at Cassandra =
SFhttp://www.datastax.com/events/cassandrasummit2012/presentations

=20
> 1. Consider Leveled compaction instead of Size Tiered.  LCS improves
> read performance at the cost of more writes.
I would look at other options first.=20
If you want to know how many SSTables a read is hitting look at nodetool =
cfhistograms

> 2. You said "skinny column family" which I took to mean not a lot of
> columns/row.  See if you can organize your data into wider rows which
> allow reading fewer rows and thus fewer queries/disk seeks.

Wide rows take longer to read than narrow ones. Artificially wide rows =
may take longer to read than narrow ones.=20


> 4. Splitting your data from your MetaData could definitely help.  I
> like separating my read heavy from write heavy CF's because generally
> speaking they benefit from different compaction methods.  But don't go
> crazy creating 1000's of CF's either.

+1
25 ms read latency is high.=20

Hope that helps.=20

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/10/2012, at 9:06 AM, Aaron Turner <synfinatic@gmail.com> wrote:

> On Mon, Oct 22, 2012 at 11:05 AM, feedly team <feedlydev@gmail.com> =
wrote:
>> Hi,
>>    I have a small 2 node cassandra cluster that seems to be =
constrained by
>> read throughput. There are about 100 writes/s and 60 reads/s mostly =
against
>> a skinny column family. Here's the cfstats for that family:
>>=20
>> SSTable count: 13
>> Space used (live): 231920026568
>> Space used (total): 231920026568
>> Number of Keys (estimate): 356899200
>> Memtable Columns Count: 1385568
>> Memtable Data Size: 359155691
>> Memtable Switch Count: 26
>> Read Count: 40705879
>> Read Latency: 25.010 ms.
>> Write Count: 9680958
>> Write Latency: 0.036 ms.
>> Pending Tasks: 0
>> Bloom Filter False Postives: 28380
>> Bloom Filter False Ratio: 0.00360
>> Bloom Filter Space Used: 874173664
>> Compacted row minimum size: 61
>> Compacted row maximum size: 152321
>> Compacted row mean size: 1445
>>=20
>> iostat shows almost no write activity, here's a typical line:
>>=20
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s =
avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sdb               0.00     0.00  312.87    0.00     6.61     0.00    =
43.27
>> 23.35  105.06   2.28  71.19
>>=20
>> and nodetool tpstats always shows pending tasks in the ReadStage. The =
data
>> set has grown beyond physical memory (250GB/node w/64GB of RAM) so I =
know
>> disk access is required, but are there particular settings I should
>> experiment with that could help relieve some read i/o pressure? I =
already
>> put memcached in front of cassandra so the row cache probably won't =
help
>> much.
>>=20
>> Also this column family stores smallish documents (usually 1-100K) =
along
>> with metadata. The document is only occasionally accessed, usually =
only the
>> metadata is read/written. Would splitting out the document into a =
separate
>> column family help?
>>=20
>=20
> Some un-expert advice:
>=20
> 1. Consider Leveled compaction instead of Size Tiered.  LCS improves
> read performance at the cost of more writes.
>=20
> 2. You said "skinny column family" which I took to mean not a lot of
> columns/row.  See if you can organize your data into wider rows which
> allow reading fewer rows and thus fewer queries/disk seeks.
>=20
> 3. Enable compression if you haven't already.
>=20
> 4. Splitting your data from your MetaData could definitely help.  I
> like separating my read heavy from write heavy CF's because generally
> speaking they benefit from different compaction methods.  But don't go
> crazy creating 1000's of CF's either.
>=20
> Hope that gives you some ideas to investigate further!
>=20
>=20
> --=20
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix =
& Windows
> Those who would give up essential Liberty, to purchase a little =
temporary
> Safety, deserve neither Liberty nor Safety.
>    -- Benjamin Franklin
> "carpe diem quam minimum credula postero"


--Apple-Mail=_F7B5ACA4-E757-46F1-A746-574448282C29
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><blockquote type=3D"cite"><blockquote type=3D"cite">and nodetool =
tpstats always shows pending tasks in the =
ReadStage.</blockquote></blockquote>Are clients reading a single row at =
a time or multiple rows ? Each row requested in a multi get becomes a =
task in the read stage.&nbsp;<div><br></div><div>Also look at the type =
of query you are sending. I talked a little about the performance of =
different query techniques at Cassandra SF<a =
href=3D"http://www.datastax.com/events/cassandrasummit2012/presentations">=
http://www.datastax.com/events/cassandrasummit2012/presentations</a></div>=
<div><br><div>&nbsp;</div><div><blockquote type=3D"cite">1. Consider =
Leveled compaction instead of Size Tiered. &nbsp;LCS improves<br>read =
performance at the cost of more writes.</blockquote>I would look at =
other options first.&nbsp;</div><div>If you want to know how many =
SSTables a read is hitting look at nodetool =
cfhistograms</div><div><br></div><div><blockquote type=3D"cite">2. You =
said "skinny column family" which I took to mean not a lot =
of<br>columns/row. &nbsp;See if you can organize your data into wider =
rows which<br>allow reading fewer rows and thus fewer queries/disk =
seeks.</blockquote></div><div>Wide rows take longer to read than narrow =
ones. Artificially wide rows may take longer to read than narrow =
ones.&nbsp;</div><div><br></div><div><br></div><div><blockquote =
type=3D"cite">4. Splitting your data from your MetaData could definitely =
help. &nbsp;I<br>like separating my read heavy from write heavy CF's =
because generally<br>speaking they benefit from different compaction =
methods. &nbsp;But don't go<br>crazy creating 1000's of CF's =
either.</blockquote></div><div>+1</div><div>25 ms read latency is =
high.&nbsp;</div><div><br></div><div>Hope that =
helps.&nbsp;</div><div><br><div apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>

<br><div><div>On 23/10/2012, at 9:06 AM, Aaron Turner &lt;<a =
href=3D"mailto:synfinatic@gmail.com">synfinatic@gmail.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite">On Mon, Oct 22, 2012 at 11:05 AM, feedly team &lt;<a =
href=3D"mailto:feedlydev@gmail.com">feedlydev@gmail.com</a>&gt; =
wrote:<br><blockquote type=3D"cite">Hi,<br> &nbsp;&nbsp;&nbsp;I have a =
small 2 node cassandra cluster that seems to be constrained by<br>read =
throughput. There are about 100 writes/s and 60 reads/s mostly =
against<br>a skinny column family. Here's the cfstats for that =
family:<br><br> SSTable count: 13<br> Space used (live): =
231920026568<br> Space used (total): 231920026568<br> Number of Keys =
(estimate): 356899200<br> Memtable Columns Count: 1385568<br> Memtable =
Data Size: 359155691<br> Memtable Switch Count: 26<br> Read Count: =
40705879<br> Read Latency: 25.010 ms.<br> Write Count: 9680958<br> Write =
Latency: 0.036 ms.<br> Pending Tasks: 0<br> Bloom Filter False Postives: =
28380<br> Bloom Filter False Ratio: 0.00360<br> Bloom Filter Space Used: =
874173664<br> Compacted row minimum size: 61<br> Compacted row maximum =
size: 152321<br> Compacted row mean size: 1445<br><br>iostat shows =
almost no write activity, here's a typical line:<br><br>Device: =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;rrqm/s =
&nbsp;&nbsp;wrqm/s &nbsp;&nbsp;&nbsp;&nbsp;r/s =
&nbsp;&nbsp;&nbsp;&nbsp;w/s &nbsp;&nbsp;&nbsp;rMB/s =
&nbsp;&nbsp;&nbsp;wMB/s avgrq-sz<br>avgqu-sz &nbsp;&nbsp;await =
&nbsp;svctm &nbsp;%util<br>sdb =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;0.00 &nbsp;&nbsp;&nbsp;&nbsp;0.00 &nbsp;312.87 =
&nbsp;&nbsp;&nbsp;0.00 &nbsp;&nbsp;&nbsp;&nbsp;6.61 =
&nbsp;&nbsp;&nbsp;&nbsp;0.00 &nbsp;&nbsp;&nbsp;43.27<br>23.35 =
&nbsp;105.06 &nbsp;&nbsp;2.28 &nbsp;71.19<br><br>and nodetool tpstats =
always shows pending tasks in the ReadStage. The data<br>set has grown =
beyond physical memory (250GB/node w/64GB of RAM) so I know<br>disk =
access is required, but are there particular settings I =
should<br>experiment with that could help relieve some read i/o =
pressure? I already<br>put memcached in front of cassandra so the row =
cache probably won't help<br>much.<br><br>Also this column family stores =
smallish documents (usually 1-100K) along<br>with metadata. The document =
is only occasionally accessed, usually only the<br>metadata is =
read/written. Would splitting out the document into a separate<br>column =
family help?<br><br></blockquote><br>Some un-expert advice:<br><br>1. =
Consider Leveled compaction instead of Size Tiered. &nbsp;LCS =
improves<br>read performance at the cost of more writes.<br><br>2. You =
said "skinny column family" which I took to mean not a lot =
of<br>columns/row. &nbsp;See if you can organize your data into wider =
rows which<br>allow reading fewer rows and thus fewer queries/disk =
seeks.<br><br>3. Enable compression if you haven't already.<br><br>4. =
Splitting your data from your MetaData could definitely help. =
&nbsp;I<br>like separating my read heavy from write heavy CF's because =
generally<br>speaking they benefit from different compaction methods. =
&nbsp;But don't go<br>crazy creating 1000's of CF's either.<br><br>Hope =
that gives you some ideas to investigate further!<br><br><br>-- =
<br>Aaron Turner<br><a href=3D"http://synfin.net/">http://synfin.net/</a> =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Twitter: =
@synfinatic<br><a =
href=3D"http://tcpreplay.synfin.net/">http://tcpreplay.synfin.net/</a> - =
Pcap editing and replay tools for Unix &amp; Windows<br>Those who would =
give up essential Liberty, to purchase a little temporary<br>Safety, =
deserve neither Liberty nor Safety.<br> &nbsp;&nbsp;&nbsp;-- Benjamin =
Franklin<br>"carpe diem quam minimum credula =
postero"<br></blockquote></div><br></div></div></body></html>=

--Apple-Mail=_F7B5ACA4-E757-46F1-A746-574448282C29--