zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From s influxdb <elastic....@gmail.com>
Subject Re: node 2 not rejoining cluster
Date Mon, 11 Apr 2016 20:35:40 GMT
reboot of the server didn't help

On Thu, Apr 7, 2016 at 6:50 PM, s influxdb <elastic.l.k@gmail.com> wrote:

> I ran tcpdump on all the three nodes.
> It looks like that for every  [PSH, ACK] there is a missing [ACK] from the
> other nodes to this 2nd node on port 3888.
>
>
> On Thu, Apr 7, 2016 at 1:29 PM, s influxdb <elastic.l.k@gmail.com> wrote:
>
>> Thanks Flavio for your quick replies.
>> The zookeeper version is 3.4.6
>>
>>
>>
>> On Thu, Apr 7, 2016 at 1:23 PM, Flavio P JUNQUEIRA <fpj@apache.org>
>> wrote:
>>
>>> You need to determine why it is not receiving notification messages. From
>>> the information you've given, it doesn't look like a zookeeper code
>>> issue.
>>>
>>> BTW, which version are you using?
>>>
>>> -Flavio
>>> On 7 Apr 2016 21:20, "s influxdb" <elastic.l.k@gmail.com> wrote:
>>>
>>> > nothin on the iptables firewall .
>>> >
>>> > What options do i have to reconnect this node to the cluster ?
>>> >
>>> >
>>> > On Thu, Apr 7, 2016 at 10:14 AM, s influxdb <elastic.l.k@gmail.com>
>>> wrote:
>>> >
>>> > > telnet works on 2888 and 3888 to the other nodes. Now i see
>>> > > java.net.SocketTimeoutException: connect timed out messages in the
>>> logs
>>> > for
>>> > > node 2
>>> > >
>>> > > On Thu, Apr 7, 2016 at 3:05 AM, Flavio Junqueira <fpj@apache.org>
>>> wrote:
>>> > >
>>> > >> I only see notifications from the node to itself. It says that
it is
>>> > >> connected to 1, but it doesn't seem to be receiving the notification
>>> > from
>>> > >> 1. It also doesn't seem to be receiving the connection request
from
>>> 3.
>>> > >>
>>> > >> Last time I've seen something like this was due to iptables rules,
>>> but
>>> > if
>>> > >> it was working before and no configuration has changed, then I
don't
>>> > know
>>> > >> what it could be.
>>> > >>
>>> > >> -Flavio
>>> > >>
>>> > >> > On 07 Apr 2016, at 05:43, s influxdb <elastic.l.k@gmail.com>
>>> wrote:
>>> > >> >
>>> > >> > this is the pastie
>>> > >> > http://pastie.org/10788301
>>> > >> >
>>> > >> > On Wed, Apr 6, 2016 at 9:41 PM, s influxdb <elastic.l.k@gmail.com
>>> >
>>> > >> wrote:
>>> > >> >
>>> > >> >> We had one of the node giving OOM java.lang.OutOfMemoryError:
>>> unable
>>> > to
>>> > >> >> create new native thread and then being unresponsive.
>>> > >> >>
>>> > >> >> We tried to add the node back to the cluster but with
no luck.
>>> > >> >>
>>> > >> >> It doesn't seem to "Receive any notification "  messages
from the
>>> > other
>>> > >> >> nodes.
>>> > >> >> Keeps "Sending notifications " in loop
>>> > >> >>
>>> > >> >> Please see attached the logs of the node that is out of
rotation.
>>> > >> >>
>>> > >> >> Any inputs appreciated.
>>> > >> >>
>>> > >> >> Thanks
>>> > >> >>
>>> > >>
>>> > >>
>>> > >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message