Return-Path: Delivered-To: apmail-qpid-users-archive@www.apache.org Received: (qmail 24854 invoked from network); 11 Mar 2010 21:14:32 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Mar 2010 21:14:32 -0000 Received: (qmail 66392 invoked by uid 500); 11 Mar 2010 21:13:57 -0000 Delivered-To: apmail-qpid-users-archive@qpid.apache.org Received: (qmail 66368 invoked by uid 500); 11 Mar 2010 21:13:57 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 66360 invoked by uid 99); 11 Mar 2010 21:13:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Mar 2010 21:13:57 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of CWoerner@demandbase.com designates 67.192.230.13 as permitted sender) Received: from [67.192.230.13] (HELO 168056-www1.demandbase.com) (67.192.230.13) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Mar 2010 21:13:49 +0000 Received: from mail.demandbase.com (unknown [10.10.10.18]) by 168056-www1.demandbase.com (Postfix) with ESMTP id C37D920179 for ; Thu, 11 Mar 2010 15:13:27 -0600 (CST) Received: from rroby.demandbase.local (10.10.10.134) by DB-SBS08.Demandbase.local (10.10.10.18) with Microsoft SMTP Server id 8.2.176.0; Thu, 11 Mar 2010 13:13:27 -0800 Message-ID: From: Charles Woerner To: "users@qpid.apache.org" In-Reply-To: <1268338543.2121.62.camel@localhost> Content-Type: multipart/alternative; boundary="Apple-Mail-3-124515023" MIME-Version: 1.0 (Apple Message framework v936) Subject: Re: c++ broker startup problem with large journal containing lots of data Date: Thu, 11 Mar 2010 13:13:27 -0800 References: <1268338543.2121.62.camel@localhost> X-Mailer: Apple Mail (2.936) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-3-124515023 Content-Type: text/plain; charset="US-ASCII"; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Thanks Kim, 1) I'm using qpid from RedHat MRG 0.5.752581-34 for i386 and the store module is RHM 0.5.3206-27, also for i386. 2) Yes, my messages are persistent. I am aware that transient messages are discarded on recovery. Your comment that durable messages "have their message content discarded when the policy limit is reached" is surprising. I have an 80 GB store and a 1 GB queue with flow-to-disk. My expectation was that I would be able to continue to enqueue durable messages past the 1GB mark (up to 80% of 80 GB) and only the first (or latest, not sure) 1GB of queue data would be held in memory. I expected the dequeues would sort of "backfill" the 1GB memory buffer until the "flowed-to-disk" messages were consumed from the store. I recognize that that this is sort of dual-purposing the backing store to do both journaling and spillover, but that doesn't seem too far fetched. I'm beginning to think this is incorrect. Is there some kind of identity relating available memory to store geometry to queue size (ie. is the following a good rule of thumb "available memory" = "store size" = "sum(max-queue-size of all queues)")? Also, in an attempt to (slowly) recover the queue I allocated 2 GB swap. This staved off the "Cannot read from child process" error with a c++ bad_alloc exception which I could see as a debug statement in the log. However, 2 hours later the system was still in the same unresponsive state (seemingly trying to complete recovery). It had not yet begun listening on 5672. I captured the lsof for qpidd user in case that might also be helpful. Yes, the sfbroker.log file is big and no, truncating it doesn't alleviate the problem :) ~]# lsof -u qpidd COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME qpidd 12365 qpidd cwd DIR 8,1 4096 573441 /root qpidd 12365 qpidd rtd DIR 8,1 4096 2 / qpidd 12365 qpidd txt REG 8,1 76165 1299146 /usr/ sbin/qpidd qpidd 12365 qpidd mem REG 8,1 18031759 1215641 /usr/ lib/libqpidcommon.so.0.1.0 qpidd 12365 qpidd mem REG 8,1 14644 786449 /lib/ libdl-2.5.so qpidd 12365 qpidd mem REG 8,1 97512 1214114 /usr/ lib/libsasl2.so.2.0.22 qpidd 12365 qpidd mem REG 8,1 206396 786499 /lib/ i686/nosegneg/libm-2.5.so qpidd 12365 qpidd mem REG 8,1 44992 786703 /lib/ libgcc_s-4.1.2-20080825.so.1 qpidd 12365 qpidd mem REG 8,1 1620400 789604 /lib/ i686/nosegneg/libc-2.5.so qpidd 12365 qpidd mem REG 8,1 127720 786500 /lib/ i686/nosegneg/libpthread-2.5.so qpidd 12365 qpidd mem REG 8,1 74616 786483 /lib/ libresolv-2.5.so qpidd 12365 qpidd mem REG 8,1 43544 786447 /lib/ libcrypt-2.5.so qpidd 12365 qpidd mem REG 8,1 2632 1213919 /usr/ lib/libaio.so.1.0.1 qpidd 12365 qpidd mem REG 8,1 15584 1247077 /usr/ lib/openais/libcpg.so.2.0.0 qpidd 12365 qpidd mem REG 8,1 14144 786738 /lib/ libuuid.so.1.2 qpidd 12365 qpidd mem REG 8,1 16068 1215646 /usr/ lib/libcman.so.2.0.115 qpidd 12365 qpidd mem REG 8,1 42068 786501 /lib/ i686/nosegneg/librt-2.5.so qpidd 12365 qpidd mem REG 8,1 62328 1215614 /usr/ lib/libboost_filesystem.so.1.33.1 qpidd 12365 qpidd mem REG 8,1 124432 786503 /lib/ ld-2.5.so qpidd 12365 qpidd mem REG 8,1 625546 1266311 /usr/ lib/qpid/daemon/replicating_listener.so qpidd 12365 qpidd mem REG 8,1 213760 1215620 /usr/ lib/libboost_program_options.so.1.33.1 qpidd 12365 qpidd mem REG 8,1 1127841 1266315 /usr/ lib/qpid/daemon/msgstore.so qpidd 12365 qpidd mem REG 8,1 946263 1215643 /usr/ lib/libsslcommon.so.0.1.0 qpidd 12365 qpidd mem REG 8,1 96924 1214052 /usr/ lib/libnssutil3.so qpidd 12365 qpidd mem REG 8,1 925520 1214126 /usr/ lib/libstdc++.so.6.0.8 qpidd 12365 qpidd mem REG 8,1 1109436 1213951 /usr/ lib/libdb_cxx-4.3.so qpidd 12365 qpidd mem REG 8,1 203340 1214124 /usr/ lib/libssl3.so qpidd 12365 qpidd mem REG 8,1 839781 1266317 /usr/ lib/qpid/daemon/xml.so qpidd 12365 qpidd mem REG 8,1 16636786 1215692 /usr/ lib/libqpidbroker.so.0.1.0 qpidd 12365 qpidd mem REG 8,1 226512 1214049 /usr/ lib/libnspr4.so qpidd 12365 qpidd mem REG 8,1 6270148 1266314 /usr/ lib/qpid/daemon/cluster.so qpidd 12365 qpidd mem REG 8,1 9912 1214097 /usr/ lib/libplds4.so qpidd 12365 qpidd mem REG 8,1 6782769 1215639 /usr/ lib/libqpidclient.so.0.1.0 qpidd 12365 qpidd mem REG 8,1 587779 1261584 /usr/ lib/qpid/client/sslconnector.so qpidd 12365 qpidd mem REG 8,1 14008 1214096 /usr/ lib/libplc4.so qpidd 12365 qpidd mem REG 8,1 515215 1266312 /usr/ lib/qpid/daemon/replication_exchange.so qpidd 12365 qpidd mem REG 8,1 558508 1266316 /usr/ lib/qpid/daemon/ssl.so qpidd 12365 qpidd mem REG 8,1 1187124 1214050 /usr/ lib/libnss3.so qpidd 12365 qpidd mem REG 8,1 4155068 1215591 /usr/ lib/libxerces-c.so.27.0 qpidd 12365 qpidd mem REG 8,1 3957632 1215596 /usr/ lib/libxqilla.so.3.0.0 qpidd 12365 qpidd 0u CHR 136,0 2 /dev/ pts/0 qpidd 12365 qpidd 1u CHR 136,0 2 /dev/ pts/0 qpidd 12365 qpidd 2u CHR 136,0 2 /dev/ pts/0 qpidd 12365 qpidd 3r FIFO 0,6 240193 pipe qpidd 12365 qpidd 4w FIFO 0,6 240193 pipe qpidd 12365 qpidd 5w REG 8,1 3556296254 98581 /var/ log/qpid/sfbroker.log qpidd 12365 qpidd 6r CHR 1,9 1065 /dev/ urandom qpidd 12365 qpidd 7r FIFO 0,6 240195 pipe qpidd 12366 qpidd cwd DIR 8,1 4096 557058 /var/ lib/qpidd/.qpidd qpidd 12366 qpidd rtd DIR 8,1 4096 2 / qpidd 12366 qpidd txt REG 8,1 76165 1299146 /usr/ sbin/qpidd qpidd 12366 qpidd mem REG 8,1 18031759 1215641 /usr/ lib/libqpidcommon.so.0.1.0 qpidd 12366 qpidd mem REG 8,1 14644 786449 /lib/ libdl-2.5.so qpidd 12366 qpidd mem REG 8,1 97512 1214114 /usr/ lib/libsasl2.so.2.0.22 qpidd 12366 qpidd mem REG 8,1 206396 786499 /lib/ i686/nosegneg/libm-2.5.so qpidd 12366 qpidd mem REG 8,1 44992 786703 /lib/ libgcc_s-4.1.2-20080825.so.1 qpidd 12366 qpidd mem REG 8,1 1620400 789604 /lib/ i686/nosegneg/libc-2.5.so qpidd 12366 qpidd mem REG 8,1 127720 786500 /lib/ i686/nosegneg/libpthread-2.5.so qpidd 12366 qpidd mem REG 8,1 74616 786483 /lib/ libresolv-2.5.so qpidd 12366 qpidd mem REG 8,1 43544 786447 /lib/ libcrypt-2.5.so qpidd 12366 qpidd mem REG 8,1 2632 1213919 /usr/ lib/libaio.so.1.0.1 qpidd 12366 qpidd mem REG 8,1 15584 1247077 /usr/ lib/openais/libcpg.so.2.0.0 qpidd 12366 qpidd mem REG 8,1 14144 786738 /lib/ libuuid.so.1.2 qpidd 12366 qpidd mem REG 8,1 16068 1215646 /usr/ lib/libcman.so.2.0.115 qpidd 12366 qpidd mem REG 8,1 42068 786501 /lib/ i686/nosegneg/librt-2.5.so qpidd 12366 qpidd mem REG 8,1 62328 1215614 /usr/ lib/libboost_filesystem.so.1.33.1 qpidd 12366 qpidd mem REG 8,1 124432 786503 /lib/ ld-2.5.so qpidd 12366 qpidd mem REG 8,1 625546 1266311 /usr/ lib/qpid/daemon/replicating_listener.so qpidd 12366 qpidd mem REG 8,1 213760 1215620 /usr/ lib/libboost_program_options.so.1.33.1 qpidd 12366 qpidd mem REG 8,1 1127841 1266315 /usr/ lib/qpid/daemon/msgstore.so qpidd 12366 qpidd mem REG 8,1 946263 1215643 /usr/ lib/libsslcommon.so.0.1.0 qpidd 12366 qpidd mem REG 8,1 96924 1214052 /usr/ lib/libnssutil3.so qpidd 12366 qpidd mem REG 8,1 925520 1214126 /usr/ lib/libstdc++.so.6.0.8 qpidd 12366 qpidd mem REG 8,1 1109436 1213951 /usr/ lib/libdb_cxx-4.3.so qpidd 12366 qpidd mem REG 8,1 203340 1214124 /usr/ lib/libssl3.so qpidd 12366 qpidd mem REG 8,1 839781 1266317 /usr/ lib/qpid/daemon/xml.so qpidd 12366 qpidd mem REG 8,1 16636786 1215692 /usr/ lib/libqpidbroker.so.0.1.0 qpidd 12366 qpidd mem REG 8,1 226512 1214049 /usr/ lib/libnspr4.so qpidd 12366 qpidd mem REG 8,1 6270148 1266314 /usr/ lib/qpid/daemon/cluster.so qpidd 12366 qpidd mem REG 8,1 9912 1214097 /usr/ lib/libplds4.so qpidd 12366 qpidd mem REG 8,1 6782769 1215639 /usr/ lib/libqpidclient.so.0.1.0 qpidd 12366 qpidd mem REG 8,1 587779 1261584 /usr/ lib/qpid/client/sslconnector.so qpidd 12366 qpidd mem REG 8,1 14008 1214096 /usr/ lib/libplc4.so qpidd 12366 qpidd mem REG 8,1 515215 1266312 /usr/ lib/qpid/daemon/replication_exchange.so qpidd 12366 qpidd mem REG 8,1 558508 1266316 /usr/ lib/qpid/daemon/ssl.so qpidd 12366 qpidd mem REG 8,1 1187124 1214050 /usr/ lib/libnss3.so qpidd 12366 qpidd mem REG 8,1 4155068 1215591 /usr/ lib/libxerces-c.so.27.0 qpidd 12366 qpidd mem REG 8,1 3957632 1215596 /usr/ lib/libxqilla.so.3.0.0 qpidd 12366 qpidd mem REG 8,96 16384 402653317 /vol/ qpidd/rhm/dat/__db.005 qpidd 12366 qpidd mem REG 8,96 450560 402653316 /vol/ qpidd/rhm/dat/__db.004 qpidd 12366 qpidd mem REG 8,96 294912 402653315 /vol/ qpidd/rhm/dat/__db.003 qpidd 12366 qpidd mem REG 8,96 278528 402653314 /vol/ qpidd/rhm/dat/__db.002 qpidd 12366 qpidd mem REG 8,96 24576 402653313 /vol/ qpidd/rhm/dat/__db.001 qpidd 12366 qpidd 0u CHR 1,3 778 /dev/ null qpidd 12366 qpidd 1u CHR 1,3 778 /dev/ null qpidd 12366 qpidd 2u CHR 1,3 778 /dev/ null qpidd 12366 qpidd 3r FIFO 0,6 240193 pipe qpidd 12366 qpidd 4w FIFO 0,6 240193 pipe qpidd 12366 qpidd 5w REG 8,1 3556296254 98581 /var/ log/qpid/sfbroker.log qpidd 12366 qpidd 6r CHR 1,9 1065 /dev/ urandom qpidd 12366 qpidd 7r 0000 0,10 0 240197 eventpoll qpidd 12366 qpidd 8w FIFO 0,6 240195 pipe qpidd 12366 qpidd 9u unix 0xed2f5040 240196 socket qpidd 12366 qpidd 10wW REG 8,96 0 131 /vol/ qpidd/lock qpidd 12366 qpidd 11r FIFO 0,6 240198 pipe qpidd 12366 qpidd 12w FIFO 0,6 240198 pipe qpidd 12366 qpidd 13u REG 8,96 8192 402653319 /vol/ qpidd/rhm/dat/queues.db qpidd 12366 qpidd 14u REG 8,96 8192 402653320 /vol/ qpidd/rhm/dat/config.db qpidd 12366 qpidd 15u REG 8,96 8192 402653321 /vol/ qpidd/rhm/dat/exchanges.db qpidd 12366 qpidd 16u REG 8,96 8192 402653322 /vol/ qpidd/rhm/dat/messages.db qpidd 12366 qpidd 17u REG 8,96 8192 402653323 /vol/ qpidd/rhm/dat/mappings.db qpidd 12366 qpidd 18u REG 8,96 8192 402653324 /vol/ qpidd/rhm/dat/bindings.db qpidd 12366 qpidd 19u REG 8,96 8192 402653325 /vol/ qpidd/rhm/dat/general.db qpidd 12366 qpidd 20u REG 8,96 10485760 402653318 /vol/ qpidd/rhm/dat/log.0000000001 qpidd 12366 qpidd 21r REG 8,96 1342177792 503316610 /vol/ qpidd/rhm/jrnl/000c/hits/JournalData.0001.jdat On Mar 11, 2010, at 12:15 PM, Kim van der Riet wrote: > A few questions: > > 1. What versions of the store and qpid are you using? > > 2. Are your messages persistent? Flow-to-disk will force transient > messages to disk, but will discard them on recovery; persistent > messages > are written to disk anyway, but have their message content discarded > when the policy limit is reached. I guess the latter is the case here. > > This behavior should be checked out... it looks suspicious. > > On Thu, 2010-03-11 at 11:01 -0800, Charles Woerner wrote: >> A bit more information, at about the time Journal recovery phase 1 is >> complete I see one of the two qpidd processes begin to acquire vast >> amounts of memory (I assume this is the phase where it begins reading >> up to max-queue-size of persistent queue data into memory). Once the >> process acquires about 97% of available memory the system usually >> dies >> with the "critical Broker start-up failed: Cannot read from child >> process.". >> >> I'm not surprised that the system dies when it runs out of memory. >> But I hope someone can help me understand why qpidd is using so much >> memory and whether I have my max-queue-size and store geometry >> configured in a reasonable manner. I would hope that there isn't >> some >> inherent limitation where one can't recover more data from a >> persistent journal than one has available memory. >> >> On Mar 10, 2010, at 10:55 PM, Charles Woerner wrote: >> >>> To follow up, it always fails at startup with this large queue, >>> but I >>> don't always see the "Broker startup failed: cannot read from child >>> process" error. Sometimes it just dies. >>> >> >> __ >> >> Charles Woerner | cwoerner@demandbase.com | demandbase > > > > --------------------------------------------------------------------- > Apache Qpid - AMQP Messaging Implementation > Project: http://qpid.apache.org > Use/Interact: mailto:users-subscribe@qpid.apache.org > __ Charles Woerner | cwoerner@demandbase.com | demandbase | 415.683.2669 --Apple-Mail-3-124515023--