bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sijie Guo <guosi...@gmail.com>
Subject Re: Help with bad errors on 4.6.1
Date Thu, 08 Mar 2018 19:52:40 GMT
On Thu, Mar 8, 2018 at 7:42 AM, Enrico Olivelli <eolivelli@gmail.com> wrote:

>
>
> 2018-03-08 14:50 GMT+01:00 Ivan Kelly <ivank@apache.org>:
>
>> It just occurred to me that this could be a problem with the recycler.
>> If we recycle a buffer too early, but then keep using it, another user
>> could pick it up, and between them they could corrupt the data that
>> would cause it to be unhandled.
>>
>
>
> Maybe current work from Sijie, Andrey and JV which Sijie is backporting to
> branch-4.6 could resolve the issue
>


I think that is unrelated. The change Andrey, JV and me dealt with is the
PendingAddOp. It is on client side not bookie side.


>
>
>
>
>
>>
>> -Ivan
>>
>> On Thu, Mar 8, 2018 at 1:49 PM, Enrico Olivelli <eolivelli@gmail.com>
>> wrote:
>> >
>> >
>> > 2018-03-08 11:57 GMT+01:00 Ivan Kelly <ivank@apache.org>:
>> >>
>> >> Hmm, yes, looks messed up. Do you have a reliable repro of this?
>> >>
>> >> "at io.netty.channel.DefaultChannelPipeline.onUnhandledInboundMessage"
>> >> <- it seems the message itself wasn't handled. Do you have any more
>> >> bookie side logs leading up to this event?
>> >
>> >
>> >
>> > The only suspicious logs are for logger named
>> > "org.apache.bookkeeper.bookie.EntryLogger"
>> >
>> >
>> > 18-03-08-09-22-40       Created new entry log file
>> > /data/qasql/bookie/./bookie_data/current/4b0.log for logId 1200.
>> > 18-03-08-09-32-41       Failed to get ledgers map index from: 53.log :
>> No
>> > ledgers map index found on entryLogId53
>> > 18-03-08-09-33-10       Failed to get ledgers map index from: 54.log :
>> No
>> > ledgers map index found on entryLogId54
>> > 18-03-08-09-33-10       Failed to get ledgers map index from: 375.log :
>> No
>> > ledgers map index found on entryLogId375
>> > 18-03-08-09-33-13       Failed to get ledgers map index from: 878.log :
>> No
>> > ledgers map index found on entryLogId878
>> > 18-03-08-09-33-22       Failed to get ledgers map index from: 1197.log
>> : No
>> > ledgers map index found on entryLogId1197
>> > 18-03-08-09-33-57       Failed to get ledgers map index from: 1198.log
>> : No
>> > ledgers map index found on entryLogId1198
>> > 18-03-08-09-34-54       Failed to get ledgers map index from: 1199.log
>> : No
>> > ledgers map index found on entryLogId1199
>>
>

This is also unrelated. It is a fine logging, since it means it attempts to
read ledgers map index, but didn't find one.

the ledgers map index is only written when it is enabled at the bookies.


> >
>> > I will try to run Bookie with level debug, but it produces lots of spam
>> >
>> > Thanks
>> > Enrico
>> >
>> >>
>> >>
>> >> -Ivan
>> >>
>> >> On Thu, Mar 8, 2018 at 9:18 AM, Enrico Olivelli <eolivelli@gmail.com>
>> >> wrote:
>> >> > Hi all,
>> >> > I am seeing this bad errors in some test environments with 4.6.1
>> >> > The errors appear during rolling restart of the application, the test
>> >> > env is
>> >> > made of 6 machines:
>> >> > - 3 bookies
>> >> > - 3 client machines (with several BK clients, of different
>> sub-systems)
>> >> > - running with 4.6.1 both client and servers
>> >> >
>> >> > *I do not have* reports of this errors from production, already
>> running
>> >> > 4.6.1 for the last month
>> >> >
>> >> > But the problem is quite scary
>> >> >
>> >> > This is a sample of relevant errors on clients (in this case
>> Majordodo
>> >> > brokers, with log level = INFO)
>> >> >
>> >> > logs/org.apache.bookkeeper.client.PendingAddOp:
>> >> > 18-03-07-09-51-43       Write of ledger entry to quorum failed:
>> L366634
>> >> > E2557
>> >> > 18-03-07-09-51-43       Write of ledger entry to quorum failed:
>> L366634
>> >> > E2558
>> >> > 18-03-07-09-51-43       Write of ledger entry to quorum failed:
>> L366634
>> >> > E2559
>> >> > 18-03-07-15-59-55       Failed to write entry (366680, 1865): Bookie
>> >> > operation timeout
>> >> > 18-03-07-15-59-55       Failed to write entry (366680, 1865): Bookie
>> >> > operation timeout
>> >> > 18-03-07-16-00-00       Failed to write entry (366680, 1865): Bookie
>> >> > operation timeout
>> >> > 18-03-07-16-00-00       Failed to write entry (366680, 1865): Bookie
>> >> > operation timeout
>> >> > 18-03-07-16-00-05       Failed to write entry (366680, 1865): Bookie
>> >> > operation timeout
>> >> >
>> >> >
>> >> > org.apache.bookkeeper.proto.PerChannelBookieClient:
>> >> >
>> >> > 18-03-07-10-06-44       Unexpected exception caught by bookie client
>> >> > channel
>> >> > handler
>> >> > 18-03-07-10-06-44       io.netty.handler.codec.DecoderException:
>> >> > java.lang.IllegalStateException: Received unknown response : op
>> code = 6
>> >> > io.netty.handler.codec.DecoderException:
>> >> > java.lang.IllegalStateException:
>> >> > Received unknown response : op code = 6
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.MessageToMessageDecoder.channelRead(M
>> essageToMessageDecoder.java:98)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:340)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(
>> ByteToMessageDecoder.java:310)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(
>> ByteToMessageDecoder.java:297)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteT
>> oMessageDecoder.java:413)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte
>> ToMessageDecoder.java:265)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:340)
>> >> >         at
>> >> >
>> >> > io.netty.channel.DefaultChannelPipeline$HeadContext.
>> channelRead(DefaultChannelPipeline.java:1414)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa
>> ultChannelPipeline.java:945)
>> >> >         at
>> >> >
>> >> > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStrea
>> mUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
>> >> >         at
>> >> >
>> >> > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEven
>> tLoop.java:404)
>> >> >         at
>> >> > io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304)
>> >> >         at
>> >> >
>> >> > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(
>> SingleThreadEventExecutor.java:886)
>> >> >         at java.base/java.lang.Thread.run(Thread.java:844)
>> >> > Caused by: java.lang.IllegalStateException: Received unknown
>> response :
>> >> > op
>> >> > code = 6
>> >> >         at
>> >> >
>> >> > org.apache.bookkeeper.proto.BookieProtoEncoding$ResponseEnDe
>> CoderPreV3.decode(BookieProtoEncoding.java:294)
>> >> >         at
>> >> >
>> >> > org.apache.bookkeeper.proto.BookieProtoEncoding$ResponseDeco
>> der.decode(BookieProtoEncoding.java:478)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.MessageToMessageDecoder.channelRead(M
>> essageToMessageDecoder.java:88)
>> >> >
>> >> >
>> >> >         ... 19 more
>> >> >
>> >> >
>> >> > Errors on Bookie side, with log level = INFO)
>> >> >
>> >> >
>> >> > io.netty.channel.DefaultChannelPipeline:
>> >> > 8-03-07-09-51-38       An exceptionCaught() event was fired, and it
>> >> > reached
>> >> > at the tail of the pipeline. It usually means the last handler in the
>> >> > pipeline did not handle the exception.
>> >> > 18-03-07-09-51-38       io.netty.util.IllegalReferenc
>> eCountException:
>> >> > refCnt: 0, increment: 1
>> >> > io.netty.util.IllegalReferenceCountException: refCnt: 0, increment:
>> 1
>> >> >         at
>> >> >
>> >> > io.netty.buffer.AbstractReferenceCountedByteBuf.release0(Abs
>> tractReferenceCountedByteBuf.java:100)
>> >> >         at
>> >> >
>> >> > io.netty.buffer.AbstractReferenceCountedByteBuf.release(Abst
>> ractReferenceCountedByteBuf.java:84)
>> >> >         at
>> >> > io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:88)
>> >> >         at
>> >> >
>> >> > io.netty.channel.DefaultChannelPipeline.onUnhandledInboundMe
>> ssage(DefaultChannelPipeline.java:1202)
>> >> >         at
>> >> >
>> >> > io.netty.channel.DefaultChannelPipeline$TailContext.
>> channelRead(DefaultChannelPipeline.java:1299)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:340)
>> >> >         at
>> >> >
>> >> > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(Ch
>> annelInboundHandlerAdapter.java:86)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:340)
>> >> >         at
>> >> >
>> >> > org.apache.bookkeeper.proto.BookieRequestHandler.channelRead
>> (BookieRequestHandler.java:78)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:340)
>> >> >         at
>> >> >
>> >> > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(Ch
>> annelInboundHandlerAdapter.java:86)
>> >> >         at
>> >> >
>> >> > org.apache.bookkeeper.proto.AuthHandler$ServerSideHandler.ch
>> annelRead(AuthHandler.java:88)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:340)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.MessageToMessageDecoder.channelRead(M
>> essageToMessageDecoder.java:102)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:340)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(
>> ByteToMessageDecoder.java:310)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(
>> ByteToMessageDecoder.java:297)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteT
>> oMessageDecoder.java:413)
>> >> >         at
>> >> >
>> >> > io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte
>> ToMessageDecoder.java:265)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>> ad(AbstractChannelHandlerContext.java:340)
>> >> >         at
>> >> >
>> >> > io.netty.channel.DefaultChannelPipeline$HeadContext.
>> channelRead(DefaultChannelPipeline.java:1414)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:362)
>> >> >         at
>> >> >
>> >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>> Read(AbstractChannelHandlerContext.java:348)
>> >> >         at
>> >> >
>> >> > io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa
>> ultChannelPipeline.java:945)
>> >> >         at
>> >> >
>> >> > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStrea
>> mUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
>> >> >         at
>> >> >
>> >> > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEven
>> tLoop.java:404)
>> >> >         at
>> >> > io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304)
>> >> >         at
>> >> >
>> >> > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(
>> SingleThreadEventExecutor.java:886)
>> >> >         at java.base/java.lang.Thread.run(Thread.java:844)
>>
>
questions:

1) which protocol version are you using?

2) is the exception coming from read response or write response? it sound
like write response. If it is write response, it would be surprised about
the ref count. because we don't send any buffer back in write response.


> >> >
>> >> >
>> >
>> >
>>
>
>

Mime
View raw message