Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=EuarCYPF5o
	IMaF2MFc0xQGfCivESSBtdyqc7zjjof2GlkKPYfzCr5IsTXLUehQjyS2yNzr6AGy
	a3A/a5zOoYCvjkoyxAyL32mH4lA4NEOAxRIi17GnWk3DvABQ5WxDc1J6LXWSfmmx
	AnPJtk6SX9LySEfhCISmD4DY+/dyrfiJo=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_18F8099C-1AF3-46B7-9473-5C1817B2A961"
Subject: Re: Write latency of counter updates across multiple rows
Date: Mon, 6 Feb 2012 09:56:47 +1300
In-Reply-To: 
 <CAPtdq2uPrfPscTmsB7AGvSHH-VHWDH2gU2ha1dEYowGDo_MkOg@mail.gmail.com>
To: user@cassandra.apache.org
References: 
 <CAPtdq2uPrfPscTmsB7AGvSHH-VHWDH2gU2ha1dEYowGDo_MkOg@mail.gmail.com>
Message-Id: <CFD1E809-C815-40C1-A359-76FFB311E325@thelastpickle.com>


--Apple-Mail=_18F8099C-1AF3-46B7-9473-5C1817B2A961
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

I'm not thinking about counters specifically here, and assuming you are =
sending batch mutations of the same size=85=20

The mutations (inserts, counter increments) for a row are turned into a =
single task server side, and are then processed in a serial fashion. If =
you send a mutation for 2 rows it will be turned into two tasks, which =
can then be processed in parallel.=20

There is an point of dimensioning returns here. Each row you write to or =
read from will become a task, if you write to 1,000 rows at once you =
will put 1,000 tasks in the thread pool which typically has 32 =
concurrent threads. This may block / add latency to other requests. It's =
more of an issue with reads than writes.=20

Does that apply to your situation ?=20

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/02/2012, at 1:19 AM, Amit Chavan wrote:

>=20
> Hi,
>=20
> In our use case, we maintain minute-wise roll ups for different =
metrics. These are stored in a counter column family where the row key =
is a composite containing the timestamp rounded to the last minute and =
an integer between 0-9 (This integer is calculated as the MD5 hash of =
the metric mod 10). The column names are the metrics we wish to track. =
Typically, each row has about 100,000 counters.
>=20
> We tested two scenarios. The first one is as mentioned above. In this =
case we got a per write latency of about 80 micro-seconds to 100 =
micro-seconds.
>=20
> In the other scenario, we calculated the integer in the row key as mod =
100. In this case we observed a per write latency of 50 micro-seconds to =
70 micro-seconds.
>=20
> I wish to understand why updates to counters were faster as they got =
spread across multiple rows?
>=20
> Cluster summary : 4 nodes running Cassandra 1.0.5. Each with 8 cores, =
32G RAM, 10G Cassandra heap. We are using replication factor of 2.
>=20
>=20
> --=20
> Thanks!
> Amit Chavan
>=20


--Apple-Mail=_18F8099C-1AF3-46B7-9473-5C1817B2A961
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I'm =
not thinking about counters specifically here, and assuming you are =
sending batch mutations of the same size=85&nbsp;<div><br></div><div>The =
mutations (inserts, counter increments) for a row are turned into a =
single task server side, and are then processed in a serial fashion. If =
you send a mutation for 2 rows it will be turned into two tasks, which =
can then be processed in parallel.&nbsp;<div><br></div><div>There is an =
point of dimensioning returns here. Each row you write to or read from =
will become a task, if you write to 1,000 rows at once you will put =
1,000 tasks in the thread pool which typically has 32 concurrent =
threads. This may block / add latency to other requests. It's more of an =
issue with reads than writes.&nbsp;</div><div><br></div><div>Does that =
apply to your situation ?&nbsp;</div><div><br></div><div><div =
apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>

<br><div><div>On 4/02/2012, at 1:19 AM, Amit Chavan wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><br =
clear=3D"all"><div>Hi,</div><div><br></div><div>In our use case, we =
maintain minute-wise roll ups for different metrics. These are stored in =
a counter column family where the row key is a composite containing the =
timestamp rounded to the last minute and an integer between 0-9 (This =
integer is calculated as the MD5 hash of the metric mod 10). The column =
names are the metrics we wish to track. Typically, each row has about =
100,000 counters.</div>
<div><br></div><div>We tested two scenarios. The first one is as =
mentioned above. In this case we got a per write latency of about 80 =
micro-seconds to 100 micro-seconds.</div><div><br></div><div>In the =
other scenario, we calculated the integer in the row key as mod 100. In =
this case we observed a per write latency of 50 micro-seconds to 70 =
micro-seconds.</div>
<div><br></div><div>I wish to understand why updates to counters were =
faster as they got spread across multiple =
rows?</div><div><br></div><div>Cluster summary : 4&nbsp;nodes running =
Cassandra 1.0.5. Each with 8 cores, 32G RAM, 10G Cassandra heap. We are =
using replication factor of 2.</div>
<div><br></div><div><br></div>-- <br>Thanks!<br>Amit Chavan<br><br>
</blockquote></div><br></div></div></body></html>=

--Apple-Mail=_18F8099C-1AF3-46B7-9473-5C1817B2A961--