Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=QEC7WJb/oZ
	NJFF3xFPg8q4jnVAoB3U1IYylEP5NvDi4oup4ofzx4vr3f0bvmFYWInMp/VOzBv6
	iRtUKZCqd7p19WnWD7PpaWjO38bvfsRkrO/HyeycKJUkSdhr9U2f3a5GWJ96d8b6
	bNCmX2QC2hwDjgWlA3jKgjl2keGb7bmp0=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1257)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_671A7ED3-3BD1-4766-8AA9-FCF023F5F6BE"
Subject: Re: single row key continues to grow, should I be concerned?
Date: Wed, 21 Mar 2012 06:37:42 +1300
In-Reply-To: 
 <CAPd80sihmtSr6bxpZeboxWA_QEsi9k0izv681JYwDumUFQ5GDg@mail.gmail.com>
To: user@cassandra.apache.org
References: 
 <CAPd80sihmtSr6bxpZeboxWA_QEsi9k0izv681JYwDumUFQ5GDg@mail.gmail.com>
Message-Id: <B7570C0F-9931-489A-8526-778895464BC3@thelastpickle.com>


--Apple-Mail=_671A7ED3-3BD1-4766-8AA9-FCF023F5F6BE
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

> The reads are only fetching slices of 20 to 100 columns max at a time =
from the row but if the key is planted on one node in the cluster I am =
concerned about that node getting the brunt of traffic.
What RF are you using, how many nodes are in the cluster, what CL do you =
read at ?

If you have lots of nodes that are in different racks the =
NetworkTopologyStrategy will do a better job of distributing read load =
than the SimpleStrategy. The DynamicSnitch can also result distribute =
load, see cassandra yaml for it's configuration.=20

> I thought about breaking the column data into multiple different row =
keys to help distribute throughout the cluster but its so darn handy =
having all the columns in one key!!
If you have a row that will continually grow it is a good idea to =
partition it in some way. Large rows can slow things like compaction and =
repair down. If you have something above 60MB it's starting to slow =
things down. Can you partition by a date range such as month ?

Large rows are also a little slower to query from
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

If most reads are only pulling 20 to 100 columns at a time are there two =
workloads ? Is it possible store just these columns in a separate row ? =
If you understand how big a row may get may be able to use the row cache =
to improve performance. =20

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/03/2012, at 2:05 PM, Blake Starkenburg wrote:

> I have a row key which is now up to 125,000 columns (and anticipated =
to grow), I know this is a far-cry from the 2-billion columns a single =
row key can store in Cassandra but my concern is the amount of reads =
that this specific row key may get compared to other row keys. This =
particular row key houses column data associated with one of the more =
popular areas of the site. The reads are only fetching slices of 20 to =
100 columns max at a time from the row but if the key is planted on one =
node in the cluster I am concerned about that node getting the brunt of =
traffic.
>=20
> I thought about breaking the column data into multiple different row =
keys to help distribute throughout the cluster but its so darn handy =
having all the columns in one key!!
>=20
> key_cache is enabled but row cache is disabled on the column family.
>=20
> Should I be concerned going forward? Any particular advice on large =
wide rows?
>=20
> Thanks!


--Apple-Mail=_671A7ED3-3BD1-4766-8AA9-FCF023F5F6BE
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><blockquote type=3D"cite">The reads are only fetching slices of 20 to =
100 columns max at a time from the row but if the key is planted on one =
node in the cluster I am concerned about that node getting the brunt of =
traffic.</blockquote>What RF are you using, how many nodes are in the =
cluster, what CL do you read at ?<div><br></div><div>If you have lots of =
nodes that are in different racks the NetworkTopologyStrategy will do a =
better job of distributing read load than the SimpleStrategy. The =
DynamicSnitch can also result distribute load, see cassandra yaml for =
it's configuration.&nbsp;</div><div><br></div><div><blockquote =
type=3D"cite">I thought about breaking the column data into multiple =
different row keys to help distribute throughout the cluster but its so =
darn handy having all the columns in one key!!<br></blockquote>If you =
have a row that will continually grow it is a good idea to partition it =
in some way. Large rows can slow things like compaction and repair down. =
If you have something above 60MB it's starting to slow things down. Can =
you partition by a date range such as month =
?</div><div><br></div><div>Large rows are also a little slower to query =
from</div><div><a =
href=3D"http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/">http:/=
/thelastpickle.com/2011/07/04/Cassandra-Query-Plans/</a></div><div><br></d=
iv><div>If most reads are only pulling 20 to 100 columns at a time are =
there two workloads ? Is it possible store just these columns in a =
separate row ? If you understand how big a row may get may be able to =
use the row cache to improve performance. =
&nbsp;</div><div><br></div><div>Cheers</div><div><br></div><div><div =
apple-content-edited=3D"true">
</div>
<br><div apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>
<br><div><div>On 20/03/2012, at 2:05 PM, Blake Starkenburg =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite">I have a row key which is now up to 125,000 columns (and =
anticipated to grow), I know this is a far-cry from the 2-billion =
columns a single row key can store in Cassandra but my concern is the =
amount of reads that this specific row key may get compared to other row =
keys. This particular row key houses column data associated with one of =
the more popular areas of the site. The reads are only fetching slices =
of 20 to 100 columns max at a time from the row but if the key is =
planted on one node in the cluster I am concerned about that node =
getting the brunt of traffic.<br>
<br>I thought about breaking the column data into multiple different row =
keys to help distribute throughout the cluster but its so darn handy =
having all the columns in one key!!<br><br>key_cache is enabled but row =
cache is disabled on the column family.<br>
<br>Should I be concerned going forward? Any particular advice on large =
wide rows?<br><br>Thanks!<br>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_671A7ED3-3BD1-4766-8AA9-FCF023F5F6BE--