Return-Path: X-Original-To: apmail-zookeeper-bookkeeper-dev-archive@minotaur.apache.org Delivered-To: apmail-zookeeper-bookkeeper-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D362298B0 for ; Fri, 6 Apr 2012 13:05:48 +0000 (UTC) Received: (qmail 43165 invoked by uid 500); 6 Apr 2012 13:05:48 -0000 Delivered-To: apmail-zookeeper-bookkeeper-dev-archive@zookeeper.apache.org Received: (qmail 43138 invoked by uid 500); 6 Apr 2012 13:05:48 -0000 Mailing-List: contact bookkeeper-dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: bookkeeper-dev@zookeeper.apache.org Delivered-To: mailing list bookkeeper-dev@zookeeper.apache.org Received: (qmail 43130 invoked by uid 99); 6 Apr 2012 13:05:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Apr 2012 13:05:48 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Apr 2012 13:05:45 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id E9FAD35CF4B for ; Fri, 6 Apr 2012 13:05:23 +0000 (UTC) Date: Fri, 6 Apr 2012 13:05:23 +0000 (UTC) From: "Flavio Junqueira (Updated) (JIRA)" To: bookkeeper-dev@zookeeper.apache.org Message-ID: <1655961638.23290.1333717524023.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <524858698.22986.1333716083990.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (BOOKKEEPER-212) Bookie stops responding when creating and deleting many ledgers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/BOOKKEEPER-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated BOOKKEEPER-212: ---------------------------------------- Attachment: bookkeeper-server.log I'm attaching the log file of the bookie that became unresponsive. In this run, the app stopped with these error messages: {noformat} 2012-04-06 12:22:07,565 - ERROR - [pool-3-thread-1:PerChannelBookieClient@550] - Add for ledger: 2382, entry: 0 failed on bookie: /IP2:3181 with code: 101 2012-04-06 12:22:07,566 - WARN - [pool-3-thread-1:PendingAddOp@146] - Write did not succeed: 2382, 0 2012-04-06 12:22:07,572 - ERROR - [pool-3-thread-1:LedgerHandle@659] - Could not get additional bookie to remake ensemble, closing ledger: 2382 org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException at org.apache.bookkeeper.client.BKException.create(BKException.java:58) at org.apache.bookkeeper.client.LedgerHandle.addEntry(LedgerHandle.java:410) at org.apache.bookkeeper.client.LedgerHandle.addEntry(LedgerHandle.java:387) at org.apache.bookkeeper.loader.LedgerLoader.main(LedgerLoader.java:38) 2012-04-06 12:22:12,588 - INFO - [Hashed wheel timer #1:PerChannelBookieClient@447] - Disconnected from bookie: /IP2:3181 2012-04-06 12:22:12,600 - INFO - [Hashed wheel timer #2:PerChannelBookieClient@447] - Disconnected from bookie: /IP1:3181 2012-04-06 12:22:12,627 - INFO - [Hashed wheel timer #3:PerChannelBookieClient@447] - Disconnected from bookie: /IP3:3181 {noformat} > Bookie stops responding when creating and deleting many ledgers > --------------------------------------------------------------- > > Key: BOOKKEEPER-212 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-212 > Project: Bookkeeper > Issue Type: Bug > Reporter: Flavio Junqueira > Priority: Blocker > Fix For: 4.1.0 > > Attachments: bookkeeper-server.log > > > I have written down a short app to try to reproduce one problematic case reported on the user list. The app does the following: > # It creates initially a number of ledgers, say 2000; > # Once it reaches 2000, for each new ledger it creates, it deletes the one at the head of the list; > # Before closing the ledger, it adds 5 entries of 1k, just to generate some traffic for any given ledger. > What I tried to achieve is to have thousands of active ledgers and delete new ledgers as I create new ones. I'll post a link to my test code later. > At some point, one bookie stops responding. The bookie seems to be up, but it is not responsive. Looking at the logs, this is what I see: > {noformat} > 2012-04-06 12:22:05,765 - INFO [SyncThread:LedgerCacheImpl@682] - Ledger 1726 is evicted from file info cache. > 2012-04-06 12:22:05,769 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1727 is evicted from file info cache. > 2012-04-06 12:22:05,772 - INFO [SyncThread:LedgerCacheImpl@682] - Ledger 1728 is evicted from file info cache. > 2012-04-06 12:22:05,780 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1729 is evicted from file info cache. > 2012-04-06 12:22:05,787 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1730 is evicted from file info cache. > 2012-04-06 12:22:05,794 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1731 is evicted from file info cache. > 2012-04-06 12:22:05,801 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1732 is evicted from file info cache. > 2012-04-06 12:22:05,807 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1733 is evicted from file info cache. > 2012-04-06 12:22:05,822 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1734 is evicted from file info cache. > 2012-04-06 12:22:05,828 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1735 is evicted from file info cache. > 2012-04-06 12:22:05,842 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1736 is evicted from file info cache. > 2012-04-06 12:22:05,851 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1737 is evicted from file info cache. > 2012-04-06 12:22:05,856 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1738 is evicted from file info cache. > 2012-04-06 12:22:05,864 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1739 is evicted from file info cache. > 2012-04-06 12:22:05,874 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1740 is evicted from file info cache. > 2012-04-06 12:22:05,885 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1741 is evicted from file info cache. > 2012-04-06 12:22:05,894 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1742 is evicted from file info cache. > 2012-04-06 12:22:05,902 - INFO [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1743 is evicted from file info cache. > 2012-04-06 12:22:05,987 - INFO [GarbageCollectorThread:LedgerCacheImpl@682] - Ledger 1744 is evicted from file info cache. > 2012-04-06 12:22:05,987 - ERROR [GarbageCollectorThread:GarbageCollectorThread$1@244] - Exception when deleting the ledger index file on the Bookie: > java.io.IOException: /home/fpj/bk/current/1/b/10b.idx not found > at org.apache.bookkeeper.bookie.FileInfo.checkOpen(FileInfo.java:118) > at org.apache.bookkeeper.bookie.FileInfo.close(FileInfo.java:194) > at org.apache.bookkeeper.bookie.LedgerCacheImpl.deleteLedger(LedgerCacheImpl.java:619) > at org.apache.bookkeeper.bookie.GarbageCollectorThread$1.gc(GarbageCollectorThread.java:242) > at org.apache.bookkeeper.meta.AbstractZkLedgerManager.doGc(AbstractZkLedgerManager.java:274) > at org.apache.bookkeeper.meta.FlatLedgerManager.garbageCollectLedgers(FlatLedgerManager.java:168) > at org.apache.bookkeeper.bookie.GarbageCollectorThread.doGcLedgers(GarbageCollectorThread.java:237) > at org.apache.bookkeeper.bookie.GarbageCollectorThread.run(GarbageCollectorThread.java:206) > 2012-04-06 12:22:05,987 - INFO [GarbageCollectorThread:LedgerCacheImpl@682] - Ledger 1745 is evicted from file info cache. > {noformat} > There are lots of exceptions like that, but otherwise I don't see anything that could make the bookie unresponsive. I'll upload the logs as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira