Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of sylvain@datastax.com designates
 209.85.220.42 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAJRvvD_j4E4rrgiTu73FEfK9L2KQOa8z4yHXFTuNV=_SL_jHiw@mail.gmail.com>
References: 
 <CAJRvvD8BhpZ5Z9dGs12HLSrDzh_42MLF+zxT8CZrbjqvsczVtA@mail.gmail.com>
	<CAKkz8Q3=gSyewpGCRza7Hbb3ZyS9me1K1PTA3AdfdW7-xxxFmQ@mail.gmail.com>
	<CAJRvvD8t-7yQn1z79r28FW212xDXWr5a+dRHh3Pp9jwhcS8Yqw@mail.gmail.com>
	<CAKkz8Q22Hc4yOoQm2gdc3RWYgrFdPzTro3tPMkcm590znapyXQ@mail.gmail.com>
	<CAJRvvD_j4E4rrgiTu73FEfK9L2KQOa8z4yHXFTuNV=_SL_jHiw@mail.gmail.com>
Date: Tue, 1 Apr 2014 17:13:12 +0200
Message-ID: 
 <CAKkz8Q0r=VEWfT5o1LMPMqzqkOuU4hJnT65FMsBNdPAZnVgiOQ@mail.gmail.com>
Subject: Re: Dead node appearing in datastax driver
From: Sylvain Lebresne <sylvain@datastax.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=047d7b162fd5466a9504f5fc9bde

--047d7b162fd5466a9504f5fc9bde
Content-Type: text/plain; charset=ISO-8859-1

What does "Did that" mean? Does that means "I upgraded to 2.0.6", or does
that mean "I manually removed entries from System.peers". If the latter,
I'd need more info on what you did exactly, what your peers table looked
like before and how they look like now: there is no reason deleting the
peers entries for hosts that at not part of the cluster anymore would have
anything to do with write latency (but if say you've removed wrong entries,
that might have make the driver think some live host had been removed and
if the drivers has less nodes to use to dispatch queries, that might impact
latency I suppose -- at least that's the only related thing I can think of).

--
Sylvain


On Tue, Apr 1, 2014 at 2:44 PM, Apoorva Gaurav <apoorva.gaurav@myntra.com>wrote:

> Did that and I actually see a significant reduction in write latency.
>
>
> On Tue, Apr 1, 2014 at 5:35 PM, Sylvain Lebresne <sylvain@datastax.com>wrote:
>
>> On Tue, Apr 1, 2014 at 1:49 PM, Apoorva Gaurav <apoorva.gaurav@myntra.com
>> > wrote:
>>
>>> Hello Sylvian,
>>>
>>> Queried system.peers on three live nodes and host4 is appearing on two
>>> of these.
>>>
>>
>> That's why the driver thinks they are still there. You're most probably
>> running into https://issues.apache.org/jira/browse/CASSANDRA-6053 since
>> you are on C* 2.0.4. As said, this is relatively harmless, but you should
>> think about upgrading to 2.0.6 to fix it in the future (you could manually
>> remove the bad entries in System.peers in the meantime if you want, they
>> are really just leftover that shouldn't be here).
>>
>> --
>> Sylvain
>>
>>
>>>
>>> On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne <sylvain@datastax.com>wrote:
>>>
>>>> On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav <
>>>> apoorva.gaurav@myntra.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> We had a 4 node cassandra 2.0.4 cluster  ( lets call them host1,
>>>>> host2, host3 and host4), out of which we've removed one node (host4) using
>>>>> nodetool removenode command. Now using nodetool status or nodetool ring we
>>>>> no longer see host4. It's also not appearing in Datastax opscenter. But its
>>>>> intermittently appearing in Metadata.getAllHosts() while connecting using
>>>>> datastax driver 1.0.4.
>>>>>
>>>>> Couple of questions :-
>>>>> -How is it appearing.
>>>>>
>>>>
>>>> Not sure. Can you try querying the peers system table on each of your
>>>> nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is
>>>> still mentioned somewhere?
>>>>
>>>>
>>>>> -Can this have impact on read / write performance of client.
>>>>>
>>>>
>>>> No. If the host doesn't exists, the driver might try to reconnect to it
>>>> at times, but since it won't be able to, it won't try to use it for reads
>>>> and writes. That does mean you might have a reconnection task running with
>>>> some regularity, but 1) it's not on the write/read path of queries and 2)
>>>> provided you've left the default reconnection policy, this will happen once
>>>> every 10 minutes and will be pretty cheap so that it will consume an
>>>> completely negligible amount of ressources. That doesn't mean I'm not
>>>> interested tracking down why that happens in the first place though.
>>>>
>>>> --
>>>> Sylvain
>>>>
>>>>
>>>>
>>>>>
>>>>> Code which we are using to connect is
>>>>>
>>>>>      public void connect() {
>>>>>
>>>>>         PoolingOptions poolingOptions = new PoolingOptions();
>>>>>
>>>>>         cluster = Cluster.builder()
>>>>>
>>>>>                 .addContactPoints(inetAddresses.toArray(newString[]{}))
>>>>>
>>>>>                 .withLoadBalancingPolicy(new RoundRobinPolicy())
>>>>>
>>>>>                 .withPoolingOptions(poolingOptions)
>>>>>
>>>>>                 .withPort(port)
>>>>>
>>>>>                 .withCredentials(username, password)
>>>>>
>>>>>                 .build();
>>>>>
>>>>>         Metadata metadata = cluster.getMetadata();
>>>>>
>>>>>         System.out.printf("Connected to cluster: %s\n",
>>>>> metadata.getClusterName());
>>>>>
>>>>>         for (Host host : metadata.getAllHosts()) {
>>>>>
>>>>>             System.out.printf("Datacenter: %s; Host: %s; Rack: %s\n",
>>>>> host.getDatacenter(), host.getAddress(), host.getRack());
>>>>>
>>>>>         }
>>>>>
>>>>>     }
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Apoorva
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Apoorva
>>>
>>
>>
>
>
> --
> Thanks & Regards,
> Apoorva
>

--047d7b162fd5466a9504f5fc9bde
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">What does &quot;Did that&quot; mean? Does that means &quot=
;I upgraded to 2.0.6&quot;, or does that mean &quot;I manually removed entr=
ies from System.peers&quot;. If the latter, I&#39;d need more info on what =
you did exactly, what your peers table looked like before and how they look=
 like now: there is no reason deleting the peers entries for hosts that at =
not part of the cluster anymore would have anything to do with write latenc=
y (but if say you&#39;ve removed wrong entries, that might have make the dr=
iver think some live host had been removed and if the drivers has less node=
s to use to dispatch queries, that might impact latency I suppose -- at lea=
st that&#39;s the only related thing I can think of).<div>
<br></div><div>--</div><div>Sylvain</div></div><div class=3D"gmail_extra"><=
br><br><div class=3D"gmail_quote">On Tue, Apr 1, 2014 at 2:44 PM, Apoorva G=
aurav <span dir=3D"ltr">&lt;<a href=3D"mailto:apoorva.gaurav@myntra.com" ta=
rget=3D"_blank">apoorva.gaurav@myntra.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Did that and I actually see=
 a significant reduction in write latency.</div><div class=3D"HOEnZb"><div =
class=3D"h5">
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Tue, Apr 1=
, 2014 at 5:35 PM, Sylvain Lebresne <span dir=3D"ltr">&lt;<a href=3D"mailto=
:sylvain@datastax.com" target=3D"_blank">sylvain@datastax.com</a>&gt;</span=
> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra">=
<div class=3D"gmail_quote"><div>On Tue, Apr 1, 2014 at 1:49 PM, Apoorva Gau=
rav <span dir=3D"ltr">&lt;<a href=3D"mailto:apoorva.gaurav@myntra.com" targ=
et=3D"_blank">apoorva.gaurav@myntra.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr">Hello Sylvian,<div><br></div><div>Queried=
 system.peers on three live nodes and host4 is appearing on two of these.</=
div>


</div></blockquote><div><br></div></div><div>That&#39;s why the driver thin=
ks they are still there. You&#39;re most probably running into=A0<a href=3D=
"https://issues.apache.org/jira/browse/CASSANDRA-6053" target=3D"_blank">ht=
tps://issues.apache.org/jira/browse/CASSANDRA-6053</a> since you are on C* =
2.0.4. As said, this is relatively harmless, but you should think about upg=
rading to 2.0.6 to fix it in the future (you could manually remove the bad =
entries in System.peers in the meantime if you want, they are really just l=
eftover that shouldn&#39;t be here).</div>


<div>=A0</div><div>--</div><div>Sylvain</div><div><div><div><br></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-=
width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;paddin=
g-left:1ex">


<div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On =
Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne <span dir=3D"ltr">&lt;<a href=
=3D"mailto:sylvain@datastax.com" target=3D"_blank">sylvain@datastax.com</a>=
&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"=
gmail_quote">


<div>On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav <span dir=3D"ltr">&lt;=
<a href=3D"mailto:apoorva.gaurav@myntra.com" target=3D"_blank">apoorva.gaur=
av@myntra.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr">Hello All,<div><br></div><div>We had a 4 =
node cassandra 2.0.4 cluster=A0=A0( lets call them host1, host2, host3 and =
host4), out of which we&#39;ve removed=A0one node (host4)=A0using nodetool =
removenode command. Now using nodetool status or nodetool ring we no longer=
 see host4. It&#39;s also not appearing in Datastax opscenter. But its inte=
rmittently appearing in Metadata.getAllHosts() while connecting using datas=
tax driver 1.0.4.=A0</div>


<div><br></div><div>Couple of questions :-</div><div>-How is it appearing.<=
/div></div></blockquote><div><br></div></div><div>Not sure. Can you try que=
rying the peers system table on each of your nodes (with cqlsh: SELECT * FR=
OM system.peers) and see if the host4 is still mentioned somewhere?</div>


<div>
<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex"><div dir=3D"ltr"><div>-Can this have impact =
on read / write performance of client.</div>


</div></blockquote><div>


<br></div></div><div>No. If the host doesn&#39;t exists, the driver might t=
ry to reconnect to it at times, but since it won&#39;t be able to, it won&#=
39;t try to use it for reads and writes. That does mean you might have a re=
connection task running with some regularity, but 1) it&#39;s not on the wr=
ite/read path of queries and 2) provided you&#39;ve left the default reconn=
ection policy, this will happen once every 10 minutes and will be pretty ch=
eap so that it will consume an completely negligible amount of ressources. =
That doesn&#39;t mean I&#39;m not interested tracking down why that happens=
 in the first place though.</div>


<div><br></div><div>--</div><div>Sylvain</div><div><div><br></div><div>=A0<=
/div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:so=
lid;padding-left:1ex">


<div dir=3D"ltr"><div><br></div><div>

Code which we are using to connect is</div>
<div><br>

</div><div>


<p>=A0 =A0=A0<span>public</span> <span>void</span> connect() {</p>
<p>=A0 =A0 =A0 =A0 PoolingOptions poolingOptions =3D <span>new</span> Pooli=
ngOptions();</p>
<p>=A0 =A0 =A0 =A0 <span>cluster</span> =3D Cluster.builder()</p>
<p>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 .addContactPoints(<span>inetAddresses</s=
pan>.toArray(<span>new</span> String[]{}))</p>
<p>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 .withLoadBalancingPolicy(<span>new</span=
> RoundRobinPolicy())</p>
<p>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 .withPoolingOptions(poolingOptions)</p>
<p>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 .withPort(<span>port</span>)</p>
<p>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 .withCredentials(<span>username</span>, =
<span>password</span>)</p>
<p>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 .build();</p>
<p>=A0 =A0 =A0 =A0 Metadata metadata =3D <span>cluster</span>.getMetadata()=
;</p>
<p>=A0 =A0 =A0 =A0 System.<span>out</span>.printf(<span>&quot;Connected to =
cluster: %s\n&quot;</span>, metadata.getClusterName());</p>
<p>=A0 =A0 =A0 =A0 <span>for</span> (Host host : metadata.getAllHosts()) {<=
/p>
<p>=A0 =A0 =A0 =A0 =A0 =A0 System.<span>out</span>.printf(<span>&quot;Datac=
enter: %s; Host: %s; Rack: %s\n&quot;</span>, host.getDatacenter(), host.ge=
tAddress(), host.getRack());</p>
<p>=A0 =A0 =A0 =A0 }</p>
<p>=A0 =A0 }</p><span><font color=3D"#888888"><p><br></p><p><br></p>-- <br>=
Thanks &amp; Regards,<br>Apoorva<br>
</font></span></div></div>
</blockquote></div></div><br></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Thanks &amp;=
 Regards,<br>Apoorva<br>
</div>
</div></div></blockquote></div></div></div><br></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Thanks &amp;=
 Regards,<br>Apoorva<br>
</div>
</div></div></blockquote></div><br></div>

--047d7b162fd5466a9504f5fc9bde--