zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From upendar devu <devulapal...@gmail.com>
Subject Re: Zookeeper -java.net.SocketException: Socket closed
Date Thu, 25 Jan 2018 16:40:31 GMT
Thank you, will check

On Thu, Jan 25, 2018 at 11:10 AM, Andor Molnar <andor@cloudera.com> wrote:

> Use EBS drives and make sure you allocate enough IOPS for the load.
>
> Andor
>
>
> On Thu, Jan 25, 2018 at 4:21 PM, upendar devu <devulapalli8@gmail.com>
> wrote:
>
> > a disk write has taken too long as well:  I will check on this, thanks
> for
> > finding it.  zk logs really bit diff to understand for me.
> >
> > On Thu, Jan 25, 2018 at 10:19 AM, upendar devu <devulapalli8@gmail.com>
> > wrote:
> >
> > > Thanks for sharing analysis , the instances running on EC2 instances
> and
> > > we have kafka,zk,storm and es instances as well but not seen such error
> > in
> > > those components if there is network latency  then there should be
> socket
> > > error in other components as data is being processed every sec.
> > >
> > > Lets hear from zookeeper dev team , hope they will respond
> > >
> > > On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar <andor@cloudera.com>
> > wrote:
> > >
> > >> No, this is not the bug I was thinking of.
> > >>
> > >> Looks like your network connection is poor between the leader and the
> > >> follower which the logs was attached. Do you have any other network
> > >> monitoring tools in place or do you see any network related error
> > messages
> > >> in your kernel logs?
> > >> Follower lost the connection to the leader:
> > >> 2018-01-23 07:40:21,709 [myid:3] - WARN
> > >> [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to
> > leader,
> > >> exception during packet send
> > >>
> > >> ...and took ages to recover: 944 secs!!
> > >> 2018-01-23 07:56:05,742 [myid:3] - INFO
> > >> [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
> > >> ELECTION TOOK - 944020
> > >>
> > >> Additionally, a disk write has taken too long as well:
> > >> 2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334
> ]
> > -
> > >> fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
> > >> adversely effect operation latency. See the ZooKeeper troubleshooting
> > >> guide
> > >>
> > >> I believe this stuff is worth to take a closer look, though I'm not an
> > >> expert of Zookeeper, maybe somebody else can give you more insight.
> > >>
> > >> Regards,
> > >> Andor
> > >>
> > >>
> > >> On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <devulapalli8@gmail.com
> >
> > >> wrote:
> > >>
> > >> > Thanks Andor for the reply.
> > >> >
> > >> > We are using zookeeper version 3.4.6; we have 3 instances ; please
> see
> > >> > below configuration , I believe we are using default configuration
> and
> > >> > attached zk log  and issue is occurred at First Occurrence:
> 01/23/2018
> > >> > 07:42:22   Last Occurrence: 01/23/2018 07:43:22
> > >> >
> > >> >
> > >> > The issue occurs 3 to 4 times in a month and get auto resolved in
> few
> > >> mins
> > >> > but this is really annoying our operations team. please let me know
> if
> > >> you
> > >> > need any additional details
> > >> >
> > >> >
> > >> >
> > >> > # The number of milliseconds of each tick
> > >> > tickTime=2000
> > >> >
> > >> > # The number of ticks that the initial synchronization phase can
> take
> > >> > initLimit=10
> > >> >
> > >> > # The number of ticks that can pass between sending a request and
> > >> getting
> > >> > an acknowledgement
> > >> > syncLimit=5
> > >> >
> > >> > # The directory where the snapshot is stored.
> > >> > dataDir=/opt/zookeeper/current/data
> > >> >
> > >> > # The port at which the clients will connect
> > >> > clientPort=2181
> > >> >
> > >> > # This is the list of Zookeeper peers:
> > >> > server.1=zookeeper1:2888:3888
> > >> > server.2=zookeeper2:2888:3888
> > >> > server.3=zookeeper3:2888:3888
> > >> >
> > >> > # The interface IP address(es) from which zookeeper will listen from
> > >> > clientPortAddress=<IP of zk>
> > >> >
> > >> > # The number of snapshots to retain in dataDir
> > >> > autopurge.snapRetainCount=3
> > >> >
> > >> > # Purge task interval in hours
> > >> > # Set to "0" to disable auto purge feature
> > >> > autopurge.purgeInterval=1
> > >> >
> > >> >
> > >> > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <andor@cloudera.com>
> > >> wrote:
> > >> >
> > >> >> Hi Upendar,
> > >> >>
> > >> >> Thanks for reporting the issue.
> > >> >> I've a gut feeling which existing bug you've run into, but would
> you
> > >> >> please
> > >> >> share some more detail (version of ZK, log context, config files,
> > >> etc.) to
> > >> >> get confidence?
> > >> >>
> > >> >> Thanks,
> > >> >> Andor
> > >> >>
> > >> >>
> > >> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <
> > devulapalli8@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >> > we are getting below error twice in a month , though its
auto
> > >> resolved
> > >> >> but
> > >> >> > anyone can explain why this error occurring and what needs
to be
> > >> done to
> > >> >> > prevent the error , is this common error and can be ignored?
> > >> >> >
> > >> >> > Please suggest.
> > >> >> >
> > >> >> >
> > >> >> > 2018-01-16 20:36:17,378 [myid:2] - WARN
> > >> >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection
> broken
> > >> for
> > >> >> id
> > >> >> > 3, my id = 2, error = java.net.SocketException: Socket closed
at
> > >> >> > java.net.SocketInputStream.socketRead0(Native Method) at
> > >> >> > java.net.SocketInputStream.socketRead(SocketInputStream.
> java:116)
> > at
> > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:171)
at
> > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:141)
at
> > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:224)
at
> > >> >> > java.io.DataInputStream.readInt(DataInputStream.java:387)
at
> > >> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$
> > RecvWorker.run(
> > >> >> > QuorumCnxManager.java:765)
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message