bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sijie Guo <guosi...@gmail.com>
Subject Re: BookKeeper#openLedgerNoRecovery hangs
Date Wed, 19 Jul 2017 09:11:21 GMT
On Wed, Jul 19, 2017 at 4:04 PM, Enrico Olivelli <eolivelli@gmail.com>
wrote:

> Hi,
> in some internal benchmarks we are experiencing openLedgerNoRecovery calls
> which remain hung.
> I see that basically that function calls ZookKeeper#getData.
>

> Does anyone have an idea of how it can happen ?
>

What version are you testing? Is it related your recent change on bumping
zookeeper version? If that's the case, we should consider rolling back the
zookeeper version.


>
> Is there any implicit timeout on ZK.getData() ? I did not find any way and
> personally I never got into this problem.
>

As far as I know, there is no timeout on zookeeper requests. It would be a
good question to zookeeper community.


>
> Maybe there is space for an improvement to add a timeout on openLedgerXXX
> operations, but anyway it is strange that the callback is never called.
>
> Unfortunately the problem happens only in integration tests, mabye I can
> work to reproduce it on a BK only test case.
>
> The case is simple: start ZK + 1 Bookie + 1 BookKeeper, create
> concurrencly many ledgers, write and concurrently open them with
> openLedgerNoRecovery from other threads.
> The fact is that no error is on ZK logs and BK logs
>

Can you turn on debugging log for the bookkeeper client and also zookeeper?
There might be logs for checking.

Another solution is to do a TCP dump for tracing the zookeeper calls to see
if the getData request and response is received at both sides.


>
> Any suggestion ?
>
> Thanks
>
> -- Enrico
>
>
>

Mime
View raw message