zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From s influxdb <elastic....@gmail.com>
Subject Re: node 2 not rejoining cluster
Date Fri, 08 Apr 2016 01:50:42 GMT
I ran tcpdump on all the three nodes.
It looks like that for every  [PSH, ACK] there is a missing [ACK] from the
other nodes to this 2nd node on port 3888.


On Thu, Apr 7, 2016 at 1:29 PM, s influxdb <elastic.l.k@gmail.com> wrote:

> Thanks Flavio for your quick replies.
> The zookeeper version is 3.4.6
>
>
>
> On Thu, Apr 7, 2016 at 1:23 PM, Flavio P JUNQUEIRA <fpj@apache.org> wrote:
>
>> You need to determine why it is not receiving notification messages. From
>> the information you've given, it doesn't look like a zookeeper code issue.
>>
>> BTW, which version are you using?
>>
>> -Flavio
>> On 7 Apr 2016 21:20, "s influxdb" <elastic.l.k@gmail.com> wrote:
>>
>> > nothin on the iptables firewall .
>> >
>> > What options do i have to reconnect this node to the cluster ?
>> >
>> >
>> > On Thu, Apr 7, 2016 at 10:14 AM, s influxdb <elastic.l.k@gmail.com>
>> wrote:
>> >
>> > > telnet works on 2888 and 3888 to the other nodes. Now i see
>> > > java.net.SocketTimeoutException: connect timed out messages in the
>> logs
>> > for
>> > > node 2
>> > >
>> > > On Thu, Apr 7, 2016 at 3:05 AM, Flavio Junqueira <fpj@apache.org>
>> wrote:
>> > >
>> > >> I only see notifications from the node to itself. It says that it is
>> > >> connected to 1, but it doesn't seem to be receiving the notification
>> > from
>> > >> 1. It also doesn't seem to be receiving the connection request from
>> 3.
>> > >>
>> > >> Last time I've seen something like this was due to iptables rules,
>> but
>> > if
>> > >> it was working before and no configuration has changed, then I don't
>> > know
>> > >> what it could be.
>> > >>
>> > >> -Flavio
>> > >>
>> > >> > On 07 Apr 2016, at 05:43, s influxdb <elastic.l.k@gmail.com>
>> wrote:
>> > >> >
>> > >> > this is the pastie
>> > >> > http://pastie.org/10788301
>> > >> >
>> > >> > On Wed, Apr 6, 2016 at 9:41 PM, s influxdb <elastic.l.k@gmail.com>
>> > >> wrote:
>> > >> >
>> > >> >> We had one of the node giving OOM java.lang.OutOfMemoryError:
>> unable
>> > to
>> > >> >> create new native thread and then being unresponsive.
>> > >> >>
>> > >> >> We tried to add the node back to the cluster but with no luck.
>> > >> >>
>> > >> >> It doesn't seem to "Receive any notification "  messages from
the
>> > other
>> > >> >> nodes.
>> > >> >> Keeps "Sending notifications " in loop
>> > >> >>
>> > >> >> Please see attached the logs of the node that is out of rotation.
>> > >> >>
>> > >> >> Any inputs appreciated.
>> > >> >>
>> > >> >> Thanks
>> > >> >>
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message