Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=C27nHg2v/u
	4fLK98Ch67D3hV6rgksgaj0YdbQyYu5J3xuN6a7eY/aLuLGFkEItZ2FIFyPf84/l
	+QYi/35UeIe8ReM/8B6EKsumoUKXXHFFbyeVAL9hoo2JhZxJGxgI5OaqGcjDE8gX
	Zxd6PQZ1/o7SxRVE0KYCscZZBMhlPMdNE=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1244.3)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_8E0151BE-D99D-4BE5-A272-979DB6112FCB"
Subject: Re: Peculiar imbalance affecting 2 machines in a 6 node cluster
Date: Wed, 10 Aug 2011 21:12:41 +1200
In-Reply-To: <8620A665-E834-43A2-864F-713C6B2055A0@bloomdigital.com>
To: user@cassandra.apache.org
References: <8620A665-E834-43A2-864F-713C6B2055A0@bloomdigital.com>
Message-Id: <234ABF6E-FDC1-4001-891B-77BD3FF34B22@thelastpickle.com>


--Apple-Mail=_8E0151BE-D99D-4BE5-A272-979DB6112FCB
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

WRT the load imbalance checking the basics: you've run cleanup after any =
tokens moves? Repair is running ?  Also sometimes nodes get a bit =
bloated from repair and will settle down with compaction.=20

Your slightly odd tokens in the MTL DC are making it a little tricky to =
understand whats going on. But I'm trying to check if you've followed =
the multi DC token selection here  =
http://wiki.apache.org/cassandra/Operations#Token_selection . Background =
about what can happen in a multi dc deployment if the tokens are not =
right =
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replica-d=
ata-distributing-between-racks-td6324819.html

This is what you currently have=85.

DC:LA
IPLA1           Up     Normal  34.57 GB        11.11%  0                 =
                         =20
IPLA2           Up     Normal  17.55 GB        11.11%  =
56713727820156410577229101238628035242     =20
IPLA3           Up     Normal  51.37 GB        11.11%  =
113427455640312821154458202477256070485    =20

DC: MTL
IPMTL1          Up     Normal  34.43 GB        22.22%  =
37809151880104273718152734159085356828     =20
IPMTL2          Up     Normal  34.56 GB        22.22%  =
94522879700260684295381835397713392071     =20
IPMTL3          Up     Normal  34.71 GB        22.22%  =
151236607520417094872610936636341427313  =20

Using the bump approach you would have=20

IPLA1	0    =20
IPLA2   	56713727820156410577229101238628035242   =20
IPLA3	113427455640312821154458202477256070484    =20

IPMTL1	1          =20
IPMTL2	56713727820156410577229101238628035243         =20
IPMTL3	113427455640312821154458202477256070485         =20

Using the interleaving you would have=20

IPLA1 	0
IPMTL1 	28356863910078205288614550619314017621
IPLA2 	56713727820156410577229101238628035242
IPMTL2 	85070591730234615865843651857942052863
IPLA3 	113427455640312821154458202477256070484
IPMTL3 	141784319550391026443072753096570088105

The current setup in LA give each node in LA 33% of the LA local ring. =
Which should be right, just checking. =20

If cleanup / repair / compaction is all good and you are confident the =
tokens are right try poking around with nodetool getendpoints to see =
which nodes keys are sent to.  Like you I cannot see anything obvious in =
NTS that would cause load to be imbalanced if they are all in the same =
rack.=20

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Aug 2011, at 11:24, Mina Naguib wrote:

> Hi everyone
>=20
> I'm observing a very peculiar type of imbalance and I'd appreciate any =
help or ideas to try.  This is on cassandra 0.7.8.
>=20
> The original cluster was 3 machines in the DCMTL, equally balanced at =
33.33% each and each holding roughly 34G.
>=20
> Then, I added to it 3 machines in the LA data center.  The ring is =
currently as follows (IP addresses redacted for clarity):
>=20
> Address         Status State   Load            Owns    Token           =
                           =20
>                                                       =
151236607520417094872610936636341427313    =20
> IPLA1           Up     Normal  34.57 GB        11.11%  0               =
                           =20
> IPMTL1          Up     Normal  34.43 GB        22.22%  =
37809151880104273718152734159085356828     =20
> IPLA2           Up     Normal  17.55 GB        11.11%  =
56713727820156410577229101238628035242     =20
> IPMTL2          Up     Normal  34.56 GB        22.22%  =
94522879700260684295381835397713392071     =20
> IPLA3           Up     Normal  51.37 GB        11.11%  =
113427455640312821154458202477256070485    =20
> IPMTL3          Up     Normal  34.71 GB        22.22%  =
151236607520417094872610936636341427313    =20
>=20
> The bump in the 3 MTL nodes (22.22%) is in anticipation of 3 more =
machines in yet another data center, but they're not ready yet to join =
the cluster.  Once that third DC joins all nodes will be at 11.11%. =
However, I don't think this is related.
>=20
> The problem I'm currently observing is visible in the LA machines, =
specifically IPLA2 and IPLA3.  IPLA2 has 50% the expected volume, and =
IPLA3 has 150% the expected volume.
>=20
> Putting their load side by side shows the peculiar ratio of 2:1:3 =
between the 3 LA nodes:
> 34.57 17.55 51.37
> (the same 2:1:3 ratio is reflected in our internal tools trending =
reads/second and writes/second)
>=20
> I've tried several iterations of compactions/cleanups to no avail.  In =
terms of config this is the main keyspace:
>  Replication Strategy: =
org.apache.cassandra.locator.NetworkTopologyStrategy
>    Options: [DCMTL:2, DCLA:2]
> And this is the cassandra-topology.properties file (IPs again redacted =
for clarity):
>  IPMTL1:DCMTL:RAC1
>  IPMTL2:DCMTL:RAC1
>  IPMTL3:DCMTL:RAC1
>  IPLA1:DCLA:RAC1
>  IPLA2:DCLA:RAC1
>  IPLA3:DCLA::RAC1
>  IPLON1:DCLON:RAC1
>  IPLON2:DCLON:RAC1
>  IPLON3:DCLON:RAC1
>  # default for unknown nodes
>  default=3DDCBAD:RACBAD
>=20
>=20
> One thing that did occur to me while reading the source code for the =
NetworkTopologyStrategy's calculateNaturalEndpoints is that it prefers =
placing data on different racks.  Since all my machines are defined as =
in the same rack, I believe that the 2-pass approach would still yield =
balanced placement.
>=20
> However, just to test, I modified live the topology file to specify =
that IPLA1, IPLA2 and IPLA3 are in 3 different racks, and sure enough I =
saw immediately that the reads/second and writes/second equalized to =
expected fair volume (I quickly reverted that change).
>=20
> So, it seems somehow related to rack awareness, but I've been raking =
my head and I can't figure out how/why, or why the three MTL machines =
are not affected the same way.
>=20
> If the solution is to specify them in different racks and run repair =
on everything, I'm okay with that - but I hate doing that without first =
understanding *why* the current behavior is the way it is.
>=20
> Any ideas would be hugely appreciated.
>=20
> Thank you.
>=20


--Apple-Mail=_8E0151BE-D99D-4BE5-A272-979DB6112FCB
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><div>WRT the load imbalance checking the basics: you've run cleanup =
after any tokens moves? Repair is running ? &nbsp;Also sometimes nodes =
get a bit bloated from repair and will settle down with =
compaction.&nbsp;</div><div><br></div><div>Your slightly odd tokens in =
the MTL DC are making it a little tricky to understand whats going on. =
But I'm trying to check if you've followed the multi DC token selection =
here &nbsp;<a =
href=3D"http://wiki.apache.org/cassandra/Operations#Token_selection">http:=
//wiki.apache.org/cassandra/Operations#Token_selection</a>&nbsp;. =
Background about what can happen in a multi dc deployment if the tokens =
are not right&nbsp;<a =
href=3D"http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/R=
eplica-data-distributing-between-racks-td6324819.html">http://cassandra-us=
er-incubator-apache-org.3065146.n2.nabble.com/Replica-data-distributing-be=
tween-racks-td6324819.html</a></div><div><br></div><div>This is what you =
currently have=85.</div><div><br></div><div><div>DC:LA</div><div>IPLA1 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;34.57 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;11.11% &nbsp;0 &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;&nbsp;<br>IPLA2 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;17.55 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;11.11% =
&nbsp;56713727820156410577229101238628035242 &nbsp; &nbsp; =
&nbsp;<br>IPLA3 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;51.37 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;11.11% =
&nbsp;113427455640312821154458202477256070485 &nbsp; =
&nbsp;&nbsp;</div></div><div><br></div><div>DC: MTL<br><div>IPMTL1 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;34.43 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;22.22% =
&nbsp;37809151880104273718152734159085356828 &nbsp; &nbsp; =
&nbsp;<br></div><div>IPMTL2 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;34.56 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;22.22% =
&nbsp;94522879700260684295381835397713392071 &nbsp; &nbsp; =
&nbsp;<br></div><div><div><div>IPMTL3 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;34.71 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;22.22% =
&nbsp;151236607520417094872610936636341427313 =
&nbsp;&nbsp;</div></div></div><div><div><br></div></div><div>Using the =
bump approach you would have&nbsp;</div><div><br></div><div>IPLA1<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>0 &nbsp; =
&nbsp;&nbsp;</div><div><div>IPLA2 &nbsp; <span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	=
</span>56713727820156410577229101238628035242&nbsp; =
&nbsp;&nbsp;</div><div><div><div>IPLA3<span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	=
</span>113427455640312821154458202477256070484&nbsp; &nbsp; =
&nbsp;</div></div></div><div><br></div><div>IPMTL1<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>1 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;</div><div>IPMTL2<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>56713727820156410577229101238628035243&nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;&nbsp;</div><div>IPMTL3<span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	=
</span>113427455640312821154458202477256070485&nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;&nbsp;</div></div><div><br></div><div>Using the =
interleaving you would =
have&nbsp;</div><div><div><br></div><div>IPLA1&nbsp;<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>0</div><div>IPMTL1&nbsp;<span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	=
</span>28356863910078205288614550619314017621</div><div>IPLA2&nbsp;<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>56713727820156410577229101238628035242</div><div>IPMTL2&nbsp;<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>85070591730234615865843651857942052863</div><div>IPLA3&nbsp;<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>113427455640312821154458202477256070484</div><div>IPMTL3&nbsp;<span=
 class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>141784319550391026443072753096570088105</div></div><div><br></div><=
div>The current setup in LA give each node in LA 33% of the LA local =
ring. Which should be right, just checking. =
&nbsp;</div><div><br></div><div>If cleanup / repair / compaction is all =
good and you are confident the tokens are right try poking around with =
nodetool getendpoints to see which nodes keys are sent to. &nbsp;Like =
you I cannot see anything obvious in NTS that would cause load to be =
imbalanced if they are all in the same =
rack.&nbsp;</div><div><br></div><div>Cheers</div><div><br></div><div><br><=
/div><div><div>
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; =
text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Cassandra Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></span>
</div>

<br><div><div>On 10 Aug 2011, at 11:24, Mina Naguib wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div>Hi =
everyone<br><br>I'm observing a very peculiar type of imbalance and I'd =
appreciate any help or ideas to try. &nbsp;This is on cassandra =
0.7.8.<br><br>The original cluster was 3 machines in the DCMTL, equally =
balanced at 33.33% each and each holding roughly 34G.<br><br>Then, I =
added to it 3 machines in the LA data center. &nbsp;The ring is =
currently as follows (IP addresses redacted for clarity):<br><br>Address =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Status State =
&nbsp;&nbsp;Load =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Owns =
&nbsp;&nbsp;&nbsp;Token =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;<br> =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;151236607520417094872610936636341427313 =
&nbsp;&nbsp;&nbsp;&nbsp;<br>IPLA1 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;34.57 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;11.11% &nbsp;0 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>IPMTL1 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;34.43 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;22.22% =
&nbsp;37809151880104273718152734159085356828 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>IPLA2 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;17.55 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;11.11% =
&nbsp;56713727820156410577229101238628035242 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>IPMTL2 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;34.56 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;22.22% =
&nbsp;94522879700260684295381835397713392071 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>IPLA3 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;51.37 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;11.11% =
&nbsp;113427455640312821154458202477256070485 =
&nbsp;&nbsp;&nbsp;&nbsp;<br>IPMTL3 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up =
&nbsp;&nbsp;&nbsp;&nbsp;Normal &nbsp;34.71 GB =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;22.22% =
&nbsp;151236607520417094872610936636341427313 =
&nbsp;&nbsp;&nbsp;&nbsp;<br><br>The bump in the 3 MTL nodes (22.22%) is =
in anticipation of 3 more machines in yet another data center, but =
they're not ready yet to join the cluster. &nbsp;Once that third DC =
joins all nodes will be at 11.11%. However, I don't think this is =
related.<br><br>The problem I'm currently observing is visible in the LA =
machines, specifically IPLA2 and IPLA3. &nbsp;IPLA2 has 50% the expected =
volume, and IPLA3 has 150% the expected volume.<br><br>Putting their =
load side by side shows the peculiar ratio of 2:1:3 between the 3 LA =
nodes:<br>34.57 17.55 51.37<br>(the same 2:1:3 ratio is reflected in our =
internal tools trending reads/second and writes/second)<br><br>I've =
tried several iterations of compactions/cleanups to no avail. &nbsp;In =
terms of config this is the main keyspace:<br> &nbsp;Replication =
Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy<br> =
&nbsp;&nbsp;&nbsp;Options: [DCMTL:2, DCLA:2]<br>And this is the =
cassandra-topology.properties file (IPs again redacted for clarity):<br> =
&nbsp;IPMTL1:DCMTL:RAC1<br> &nbsp;IPMTL2:DCMTL:RAC1<br> =
&nbsp;IPMTL3:DCMTL:RAC1<br> &nbsp;IPLA1:DCLA:RAC1<br> =
&nbsp;IPLA2:DCLA:RAC1<br> &nbsp;IPLA3:DCLA::RAC1<br> =
&nbsp;IPLON1:DCLON:RAC1<br> &nbsp;IPLON2:DCLON:RAC1<br> =
&nbsp;IPLON3:DCLON:RAC1<br> &nbsp;# default for unknown nodes<br> =
&nbsp;default=3DDCBAD:RACBAD<br><br><br>One thing that did occur to me =
while reading the source code for the NetworkTopologyStrategy's =
calculateNaturalEndpoints is that it prefers placing data on different =
racks. &nbsp;Since all my machines are defined as in the same rack, I =
believe that the 2-pass approach would still yield balanced =
placement.<br><br>However, just to test, I modified live the topology =
file to specify that IPLA1, IPLA2 and IPLA3 are in 3 different racks, =
and sure enough I saw immediately that the reads/second and =
writes/second equalized to expected fair volume (I quickly reverted that =
change).<br><br>So, it seems somehow related to rack awareness, but I've =
been raking my head and I can't figure out how/why, or why the three MTL =
machines are not affected the same way.<br><br>If the solution is to =
specify them in different racks and run repair on everything, I'm okay =
with that - but I hate doing that without first understanding *why* the =
current behavior is the way it is.<br><br>Any ideas would be hugely =
appreciated.<br><br>Thank =
you.<br><br></div></blockquote></div><br></div></div></body></html>=

--Apple-Mail=_8E0151BE-D99D-4BE5-A272-979DB6112FCB--