bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sijie Guo <>
Subject Re: BookKeeper#openLedgerNoRecovery hangs
Date Wed, 19 Jul 2017 09:11:21 GMT
On Wed, Jul 19, 2017 at 4:04 PM, Enrico Olivelli <>

> Hi,
> in some internal benchmarks we are experiencing openLedgerNoRecovery calls
> which remain hung.
> I see that basically that function calls ZookKeeper#getData.

> Does anyone have an idea of how it can happen ?

What version are you testing? Is it related your recent change on bumping
zookeeper version? If that's the case, we should consider rolling back the
zookeeper version.

> Is there any implicit timeout on ZK.getData() ? I did not find any way and
> personally I never got into this problem.

As far as I know, there is no timeout on zookeeper requests. It would be a
good question to zookeeper community.

> Maybe there is space for an improvement to add a timeout on openLedgerXXX
> operations, but anyway it is strange that the callback is never called.
> Unfortunately the problem happens only in integration tests, mabye I can
> work to reproduce it on a BK only test case.
> The case is simple: start ZK + 1 Bookie + 1 BookKeeper, create
> concurrencly many ledgers, write and concurrently open them with
> openLedgerNoRecovery from other threads.
> The fact is that no error is on ZK logs and BK logs

Can you turn on debugging log for the bookkeeper client and also zookeeper?
There might be logs for checking.

Another solution is to do a TCP dump for tracing the zookeeper calls to see
if the getData request and response is received at both sides.

> Any suggestion ?
> Thanks
> -- Enrico

View raw message