pulsar-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] sijie commented on issue #3279: Error while recovering ledger when send messages by producer
Date Wed, 02 Jan 2019 09:08:34 GMT
sijie commented on issue #3279: Error while recovering ledger when send messages by producer
URL: https://github.com/apache/pulsar/issues/3279#issuecomment-450814189
 
 
   I have worked with @codelipenghui on debugging this issue. When the issue happened, following
logging statements were found at broker.
   
   ```
   15:10:00.028 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  org.apache.bookkeeper.client.LedgerHandle
- Conditional update ledger metadata for ledger 153718 failed.
   15:10:00.029 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  org.apache.bookkeeper.client.LedgerRecoveryOp
- Close ledger 153718 failed during recovery:
   ```
   
   So the `LedgerRecoveryException` received at producer side is coming from broker failing
on updating ledger metadata when loading a topic - *Conditional update ledger metadata for
ledger 153718 failed`.
   
   **Why this happened**
   
   The *Conditional update failure* can happen when the ownership of a topic is transferred
from one broker to the other broker. The transfer can be triggered by any events, for example
topic reassigned when network is partitioned, load balancing and such. 
   
   During this period, old owner is unloading the topic and closing the last ledger in the
topic. Closing the ledger involves updating ledger metadata. The new owner is loading the
topic, and recovering the last ledger in the topic. At the end of recovery, it will also close
the ledger (which also updates ledger metadata). Concurrent metadata updates will trigger
this "conditional update" failure. One update will succeed and the other update will fail.
And broker and client don't have retry logics for this case, so the exception is popped all
the way back to applications.
   
   This issue will be fixed by bk 4.9.0 release, since bk 4.9.0 will handle conditional update
failure on closing and will not throw exception. (/cc @ivankelly for confirmation)
   
   However I think at Pulsar side, there are a few improvements can be considered.
   
   for example, at producer side, producer can potentially catch this exception and determine
whether it should retry on this exception or not. If it can retry, the producer can retry
it before popping the exception to applications.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message