From user-return-20435-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Fri Apr 13 13:19:27 2012 Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B13A9860 for ; Fri, 13 Apr 2012 13:19:27 +0000 (UTC) Received: (qmail 32759 invoked by uid 500); 13 Apr 2012 13:19:25 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 32677 invoked by uid 500); 13 Apr 2012 13:19:25 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 32668 invoked by uid 99); 13 Apr 2012 13:19:25 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Apr 2012 13:19:25 +0000 Received: from localhost (HELO mail-iy0-f180.google.com) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Apr 2012 13:19:25 +0000 Received: by iage36 with SMTP id e36so5785857iag.11 for ; Fri, 13 Apr 2012 06:19:24 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.185.193 with SMTP id fe1mr1536789igc.9.1334323164580; Fri, 13 Apr 2012 06:19:24 -0700 (PDT) Received: by 10.42.240.135 with HTTP; Fri, 13 Apr 2012 06:19:24 -0700 (PDT) In-Reply-To: References: Date: Fri, 13 Apr 2012 14:19:24 +0100 Message-ID: Subject: Re: BigCouch - Replication failing with Cannot Allocate memory From: Robert Newson To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I think you should point out that "My idea behind these tests is that it may be that your database may be corrupted (or seen as corrupted by BigCouch at the second test) and what you get is just garbage at a certain document. " is based on no evidence. Nor, if it were true, would it necessarily explain the observed behavior either. It would be useful if we could all stick to asserting only things we know to be true or have reasonable grounds to believe are true. Unfounded speculation, though offered sincerely, is not helpful on a mailing list intended to provide assistance. Thanks, B. On 13 April 2012 13:55, CGS wrote: > Hi Mike, > > I haven't used BigCouch by now and that's why I haven't said anything by > now. Still, giving a thought of what may occur there, I propose few tests > if you have time: > 1. Try to replicate the database in another CouchDB. > 2. If 1 passes, try to replicate to only one node at the time. > 3. If 2 passes, increase the pool of nodes with 1 and repeat the > replication (for sure it will fail at all 3 nodes at the time). > > My idea behind these tests is that it may be that your database may be > corrupted (or seen as corrupted by BigCouch at the second test) and what > you get is just garbage at a certain document. That's why I proposed the > first test. The second test is to see if any of the nodes has a problem i= n > configuration (or if there is any incompatibility in between your CouchDB > and BigCouch in manipulating your docs). Finally, the third test is to se= e > if server/node resources limit the number of replications (and at how man= y > it starts to fail). > > Can you also check the size of the shards at tests 2 and 3? > > If you consider that these tests are irrelevant, please, ignore my > suggestion. > > CGS > > > > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber wrote: > >> I upped the memory to 6GB on each of the nodes and got exactly the same >> issue in the same time frame i.e. the increased RAM did not seem to by m= e >> any additional time. >> >> Mike >> >> -----Original Message----- >> From: Robert Newson [mailto:rnewson@apache.org] >> Sent: 12 April 2012 19:34 >> To: user@couchdb.apache.org >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory >> >> 2GB total ram does sound tight. I can only compare to high volume >> production clusters which have much more ram than this. Given that >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the >> rest one? To couchjs processes, by chance? If so, you can reduce the >> maximum size of that pool in config, I think the default is 50. >> >> On 12 April 2012 18:32, Mike Kimber wrote: >> > Ok, I have 3 nodes all load balanced with HAproxy: >> > >> > Centos 5.8 (Virtualised) >> > 2 Cores >> > 2GB RAM >> > >> > I'm trying to replicate about 75K documents which total 6GB when >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told the= y >> are fairly large documents. >> > >> > When it goes pear shaped Vsmstat starts using a lot of memory: >> > >> > procs -----------memory---------- ---swap-- -----io---- --system-- >> -----cpu------ >> > =A0r =A0b =A0 swpd =A0 free =A0 buff =A0cache =A0 si =A0 so =A0 =A0bi = =A0 =A0bo =A0 in =A0 cs us sy >> id wa st >> > =A01 =A02 570576 =A0 8808 =A0 =A0140 =A0 7208 2998 2249 =A03154 =A0224= 9 1234 =A0569 =A01 =A06 >> =A02 91 =A00 >> > =A00 =A02 569656 =A0 9156 =A0 =A0156 =A0 7504 2330 1899 =A02405 =A0190= 4 1246 =A0595 =A01 =A05 >> =A09 85 =A00 >> > =A01 =A01 575412 =A0 9516 =A0 =A0236 =A014928 1549 2261 =A03242 =A0226= 1 1237 =A0593 =A01 =A07 >> =A01 91 =A00 >> > =A00 =A02 607092 =A013220 =A0 =A0168 =A0 8156 3772 9012 =A03871 =A0901= 7 1284 =A0714 =A01 10 >> =A04 85 =A00 >> > =A01 =A00 444336 857004 =A0 =A0220 =A010212 5781 =A0 =A00 =A06202 =A0 = =A0 0 1574 1010 13 =A07 >> 33 47 =A00 >> > =A01 =A00 442176 870684 =A0 =A0428 =A011052 2049 =A0 =A00 =A02208 =A0 = 140 2561 1541 17 =A08 >> 49 26 =A00 >> > =A00 =A00 442176 813140 =A0 =A0460 =A011968 =A0170 =A0 =A00 =A0 348 = =A0 =A0 0 2672 1565 25 =A09 >> 61 =A04 =A00 >> > =A00 =A01 442176 744972 =A0 =A0484 =A012224 5440 =A0 =A00 =A05493 =A0 = =A0 7 2432 =A0900 =A08 =A04 >> 49 40 =A00 >> > =A00 =A01 442176 714048 =A0 =A0484 =A012296 4547 =A0 =A00 =A04547 =A0 = =A0 0 1799 =A0827 =A04 =A02 >> 50 44 =A00 >> > =A00 =A01 442176 686304 =A0 =A0496 =A012688 5128 =A0 =A00 =A05222 =A0 = =A0 0 1696 =A0999 =A09 =A02 >> 50 40 =A00 >> > =A00 =A03 444000 =A0 8712 =A0 =A0444 =A012876 =A0299 =A0368 =A0 331 = =A0 380 1294 =A0188 22 20 >> 36 23 =A00 >> > =A00 =A03 469340 =A010040 =A0 =A0116 =A0 7336 =A0 29 5087 =A0 =A074 = =A05090 1232 =A0268 =A03 22 >> =A00 75 =A00 >> > =A01 =A02 584356 =A010220 =A0 =A0124 =A0 6744 11367 28722 11370 28722 = 1643 1300 =A05 >> 19 17 59 =A00 >> > =A00 =A01 624908 =A010640 =A0 =A0132 =A0 7036 6518 12879 =A06590 12884= 1296 =A0717 =A03 10 >> 29 58 =A00 >> > =A00 =A02 652556 =A010948 =A0 =A0252 =A014776 3799 9494 =A05459 =A0949= 4 1294 =A0646 =A02 =A09 >> 32 57 =A00 >> > =A00 =A02 677784 =A010648 =A0 =A0244 =A014528 3819 8196 =A03819 =A0820= 1 1274 =A0588 =A02 =A07 >> 30 61 =A00 >> > =A00 =A02 688460 =A0 9512 =A0 =A0212 =A0 8224 3013 4522 =A03125 =A0452= 2 1379 =A0519 =A02 =A07 >> =A06 84 =A00 >> > =A00 =A03 699164 =A0 9888 =A0 =A0208 =A0 8468 2192 4014 =A02228 =A0401= 4 1302 =A0495 =A01 =A06 >> 11 83 =A00 >> > =A02 =A00 713104 =A0 9004 =A0 =A0144 =A0 9192 2606 4490 =A02848 =A0449= 0 1350 =A0487 =A01 =A08 >> 16 75 =A00 >> > >> > It only ever takes out one node at a time and the other nodes seem to = be >> doing very little while the one node is running out of memory. >> > >> > If I kick it off again it processed some more and then spikes the memo= ry >> and fails >> > >> > Thanks >> > >> > Mike >> > >> > PS: hope you enjoyed you couchdb get together! >> > >> > -----Original Message----- >> > From: Robert Newson [mailto:rnewson@apache.org] >> > Sent: 12 April 2012 17:28 >> > To: user@couchdb.apache.org >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate memor= y >> > >> > What kind of load were you putting the machine on? >> > >> > On 12 April 2012 17:24, Robert Newson wrote: >> >> Could you show your vm.args file? >> >> >> >> On 12 April 2012 17:23, Robert Newson wrote: >> >>> Unfortunately your request for help coincided with the two day Couch= DB >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other >> >>> ways to get bigcouch support, but we happily answer queries here too= , >> >>> when not at the Model UN of CouchDB. :D >> >>> >> >>> B. >> >>> >> >>> On 12 April 2012 17:10, Mike Kimber wrote: >> >>>> Looks like this isn't the right place based on the responses so far= . >> Shame I hoped this was going to help solve our index/view rebuild times = etc. >> >>>> >> >>>> Mike >> >>>> >> >>>> -----Original Message----- >> >>>> From: Mike Kimber [mailto:mkimber@kana.com] >> >>>> Sent: 10 April 2012 09:20 >> >>>> To: user@couchdb.apache.org >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory >> >>>> >> >>>> I'm not sure if this is the correct place to raise an issue I am >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch >> cluster? If this is not the correct place please point me in the right >> direction if it is then any one have any ideas why I keep getting the >> following error message when I kick of a replication; >> >>>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type >> "heap"). >> >>>> >> >>>> My set-up is: >> >>>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7 >> >>>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following >> local.ini overrides pulling from the Standalone couchdb (78K documents) >> >>>> >> >>>> [httpd] >> >>>> bind_address =3D XXX.XX.X.XX >> >>>> >> >>>> [cluster] >> >>>> ; number of shards for a new database >> >>>> q =3D 9 >> >>>> ; number of copies of each shard >> >>>> n =3D 1 >> >>>> >> >>>> [couchdb] >> >>>> database_dir =3D /other/bigcouch/database >> >>>> view_index_dir =3D /other/bigcouch/view >> >>>> >> >>>> The error is always generate on the third node in the cluster and t= he >> server basically max's out on memory before hand. The other nodes seem t= o >> be doing very little, but are getting data i.e. the shard sizes are >> growing. I've put the copies per shard down to 1 as currently I'm not >> interested in resilience. >> >>>> >> >>>> Any help would be greatly appreciated. >> >>>> >> >>>> Mike >> >>>> >>