Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of daning@netseer.com designates
 209.85.217.174 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAP7WDFUFqQLWpc=m6hVy5OtMAp5PgmTQ0qiQwJ8S8b+zjFT-RA@mail.gmail.com>
References: 
 <CAGiE6h-u2g9V=dbQvCGVipRSsC6REwago=vPzScdEjRaLsJ=2w@mail.gmail.com>
	<CAP7WDFUFqQLWpc=m6hVy5OtMAp5PgmTQ0qiQwJ8S8b+zjFT-RA@mail.gmail.com>
Date: Tue, 11 Jun 2013 14:34:16 -0700
Message-ID: 
 <CAGiE6h8tyefdqCRJjyLyPNKEk0J+64F=TogNo6MSdg3VHvHd8w@mail.gmail.com>
Subject: Re: Multiple data center performance
From: Daning Wang <daning@netseer.com>
To: user@cassandra.apache.org, comomore@gmail.com
Content-Type: multipart/alternative; boundary=001a11c376aab57b4604dee7a875

--001a11c376aab57b4604dee7a875
Content-Type: text/plain; charset=ISO-8859-1

It is counter caused the problem. counter will replicate to all replicas
during write regardless the consistency level.

In our case. we don't need to sync the counter across the center. so moving
counter to new keyspace and all the replica in one center solved problem.

There is option replicate_on_write on table. If you turn that off for
counter might have better performance. but you are on high risk to lose
data and create inconsistency. I did not try this option.

Daning


On Sat, Jun 8, 2013 at 6:53 AM, srmore <comomore@gmail.com> wrote:

> I am seeing the similar behavior, in my case I have 2 nodes in each
> datacenter and one node always has high latency (equal to the latency
> between the two datacenters). When one of the datacenters is shutdown the
> latency drops.
>
> I am curious to know whether anyone else has these issues and if yes how
> did to get around it.
>
> Thanks !
>
>
> On Fri, Jun 7, 2013 at 11:49 PM, Daning Wang <daning@netseer.com> wrote:
>
>> We have deployed multi-center but got performance issue. When the nodes
>> on other center are up, the read response time from clients is 4 or 5 times
>> higher. when we take those nodes down, the response time becomes
>> normal(compare to the time before we changed to multi-center).
>>
>> We have high volume on the cluster, the consistency level is one for
>> read. so my understanding is most of traffic between data center should be
>> read repair. but seems that could not create much delay.
>>
>> What could cause the problem? how to debug this?
>>
>> Here is the keyspace,
>>
>> [default@dsat] describe dsat;
>> Keyspace: dsat:
>>   Replication Strategy:
>> org.apache.cassandra.locator.NetworkTopologyStrategy
>>   Durable Writes: true
>>     Options: [dc2:1, dc1:3]
>>   Column Families:
>>     ColumnFamily: categorization_cache
>>
>>
>> Ring
>>
>> Datacenter: dc1
>> ===============
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address           Load       Tokens  Owns (effective)  Host ID
>>                         Rack
>> UN  xx.xx.xx..111       59.2 GB    256     37.5%
>> 4d6ed8d6-870d-4963-8844-08268607757e  rac1
>> DN  xx.xx.xx..121       99.63 GB   256     37.5%
>> 9d0d56ce-baf6-4440-a233-ad6f1d564602  rac1
>> UN  xx.xx.xx..120       66.32 GB   256     37.5%
>> 0fd912fb-3187-462b-8c8a-7d223751b649  rac1
>> UN  xx.xx.xx..118       63.61 GB   256     37.5%
>> 3c6e6862-ab14-4a8c-9593-49631645349d  rac1
>> UN  xx.xx.xx..117       68.16 GB   256     37.5%
>> ee6cdf23-d5e4-4998-a2db-f6c0ce41035a  rac1
>> UN  xx.xx.xx..116       32.41 GB   256     37.5%
>> f783eeef-1c51-4f91-ab7c-a60669816770  rac1
>> UN  xx.xx.xx..115       64.24 GB   256     37.5%
>> e75105fb-b330-4f40-aa4f-8e6e11838e37  rac1
>> UN  xx.xx.xx..112       61.32 GB   256     37.5%
>> 2547ee54-88dd-4994-a1ad-d9ba367ed11f  rac1
>> Datacenter: dc2
>> ===============
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address           Load       Tokens  Owns (effective)  Host ID
>>                         Rack
>> DN  xx.xx.xx.199    58.39 GB   256     50.0%
>> 6954754a-e9df-4b3c-aca7-146b938515d8  rac1
>> DN  xx.xx.xx..61      33.79 GB   256     50.0%
>> 91b8d510-966a-4f2d-a666-d7edbe986a1c  rac1
>>
>>
>> Thank you in advance,
>>
>> Daning
>>
>>
>

--001a11c376aab57b4604dee7a875
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">It is counter caused the problem. counter will replicate t=
o all replicas during write regardless the consistency level.=A0<div><br></=
div><div style>In our case. we don&#39;t need to sync the counter across th=
e center. so moving counter to new keyspace and all the replica in one cent=
er=A0solved=A0problem.</div>
<div style><br></div><div style>There is option=A0replicate_on_write on tab=
le. If you turn that off for counter might have better performance. but you=
 are on high risk to lose data and create inconsistency. I did not try this=
 option.</div>
<div style><br></div><div style>Daning</div></div><div class=3D"gmail_extra=
"><br><br><div class=3D"gmail_quote">On Sat, Jun 8, 2013 at 6:53 AM, srmore=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:comomore@gmail.com" target=3D"_bla=
nk">comomore@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>I am seeing the s=
imilar behavior, in my case I have 2 nodes in each datacenter and one node =
always has high latency (equal to the latency between the two datacenters).=
 When one of the datacenters is shutdown the latency drops.<br>

<br></div>I am curious to know whether anyone else has these issues and if =
yes how did to get around it.<br><br></div>Thanks !<br></div><div class=3D"=
HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"=
gmail_quote">
On Fri, Jun 7, 2013 at 11:49 PM, Daning Wang <span dir=3D"ltr">&lt;<a href=
=3D"mailto:daning@netseer.com" target=3D"_blank">daning@netseer.com</a>&gt;=
</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">We have deployed multi-cent=
er but got performance issue. When the nodes on other center are up, the re=
ad response time from clients is 4 or 5 times higher. when we take those no=
des down, the response time becomes normal(compare to the time before we ch=
anged to multi-center).<div>


<br></div><div>We have high volume on the cluster, the consistency level is=
 one for read. so my understanding is most of traffic between data center s=
hould be read repair. but seems that could not create much delay.</div>


<div><br></div><div>What could cause the problem? how to debug this?</div><=
div><br></div><div>Here is the keyspace,</div><div><br></div><div><div>[def=
ault@dsat] describe dsat;</div><div>
Keyspace: dsat:</div><div>=A0 Replication Strategy: org.apache.cassandra.lo=
cator.NetworkTopologyStrategy</div><div>=A0 Durable Writes: true</div><div>=
=A0 =A0 Options: [dc2:1, dc1:3]</div><div>=A0 Column Families:</div><div>=
=A0 =A0 ColumnFamily: categorization_cache</div>


<div>=A0</div><div><br></div><div>Ring</div><div><br></div><div><div>Datace=
nter: dc1<br></div><div>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</div>=
<div>Status=3DUp/Down</div><div>|/ State=3DNormal/Leaving/Joining/Moving</d=
iv><div>-- =A0Address =A0 =A0 =A0 =A0 =A0 Load =A0 =A0 =A0 Tokens =A0Owns (=
effective) =A0Host ID =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 Rack</div>


<div>UN =A0xx.xx.xx..111 =A0 =A0 =A0 59.2 GB =A0 =A0256 =A0 =A0 37.5% =A0 =
=A0 =A0 =A0 =A0 =A0 4d6ed8d6-870d-4963-8844-08268607757e =A0rac1</div><div>=
DN =A0xx.xx.xx..121 =A0 =A0 =A0 99.63 GB =A0 256 =A0 =A0 37.5% =A0 =A0 =A0 =
=A0 =A0 =A0 9d0d56ce-baf6-4440-a233-ad6f1d564602 =A0rac1</div>


<div>UN =A0xx.xx.xx..120 =A0 =A0 =A0 66.32 GB =A0 256 =A0 =A0 37.5% =A0 =A0=
 =A0 =A0 =A0 =A0 0fd912fb-3187-462b-8c8a-7d223751b649 =A0rac1</div><div>UN =
=A0xx.xx.xx..118 =A0 =A0 =A0 63.61 GB =A0 256 =A0 =A0 37.5% =A0 =A0 =A0 =A0=
 =A0 =A0 3c6e6862-ab14-4a8c-9593-49631645349d =A0rac1</div>


<div>UN =A0xx.xx.xx..117 =A0 =A0 =A0 68.16 GB =A0 256 =A0 =A0 37.5% =A0 =A0=
 =A0 =A0 =A0 =A0 ee6cdf23-d5e4-4998-a2db-f6c0ce41035a =A0rac1</div><div>UN =
=A0xx.xx.xx..116 =A0 =A0 =A0 32.41 GB =A0 256 =A0 =A0 37.5% =A0 =A0 =A0 =A0=
 =A0 =A0 f783eeef-1c51-4f91-ab7c-a60669816770 =A0rac1</div>


<div>UN =A0xx.xx.xx..115 =A0 =A0 =A0 64.24 GB =A0 256 =A0 =A0 37.5% =A0 =A0=
 =A0 =A0 =A0 =A0 e75105fb-b330-4f40-aa4f-8e6e11838e37 =A0rac1</div><div>UN =
=A0xx.xx.xx..112 =A0 =A0 =A0 61.32 GB =A0 256 =A0 =A0 37.5% =A0 =A0 =A0 =A0=
 =A0 =A0 2547ee54-88dd-4994-a1ad-d9ba367ed11f =A0rac1</div>


<div>Datacenter: dc2</div><div>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</div><div>Status=3DUp/Down</div><div>|/ State=3DNormal/Leaving/Joining/=
Moving</div><div>-- =A0Address =A0 =A0 =A0 =A0 =A0 Load =A0 =A0 =A0 Tokens =
=A0Owns (effective) =A0Host ID =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 Rack</div>


<div>DN =A0xx.xx.xx.199 =A0 =A058.39 GB =A0 256 =A0 =A0 50.0% =A0 =A0 =A0 =
=A0 =A0 =A0 6954754a-e9df-4b3c-aca7-146b938515d8 =A0rac1</div><div>DN =A0xx=
.xx.xx..61 =A0 =A0 =A033.79 GB =A0 256 =A0 =A0 50.0% =A0 =A0 =A0 =A0 =A0 =
=A0 91b8d510-966a-4f2d-a666-d7edbe986a1c =A0rac1</div>


</div><div><br></div><div><br></div><div>Thank you in advance,</div><div><b=
r></div><div>Daning</div></div><div><br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a11c376aab57b4604dee7a875--