Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
MIME-Version: 1.0
In-Reply-To: <A7D50E04F38FD44D9D914F2ABCA592BF2DE1B23003@BE259.mail.lan>
References: <A7D50E04F38FD44D9D914F2ABCA592BF2DE19FE78E@BE259.mail.lan>
	<A7D50E04F38FD44D9D914F2ABCA592BF2DE1B22FE6@BE259.mail.lan>
	<CABvT1DFLb0yh34LXRBVA01yVxRe_GyMCA7wYuwW8HmbXxgDygw@mail.gmail.com>
	<CABvT1DGaXYFY0u9N0G38FHy=VDm81xfMc2xeAkkmDCgQdeZWig@mail.gmail.com>
	<CABvT1DEQ6i76c+gZtR4xi0j5FKPW_W_xe46brUF+=Qz2eiQdyw@mail.gmail.com>
	<A7D50E04F38FD44D9D914F2ABCA592BF2DE1B23003@BE259.mail.lan>
Date: Thu, 12 Apr 2012 19:33:47 +0100
Message-ID: 
 <CABvT1DFBaQ7vGQwODhX_b7iXMtoHwa4-aqcSUP24jNgvJnVC-g@mail.gmail.com>
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
From: Robert Newson <rnewson@apache.org>
To: user@couchdb.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

2GB total ram does sound tight. I can only compare to high volume
production clusters which have much more ram than this. Given that
beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
rest one? To couchjs processes, by chance? If so, you can reduce the
maximum size of that pool in config, I think the default is 50.

On 12 April 2012 18:32, Mike Kimber <mkimber@kana.com> wrote:
> Ok, I have 3 nodes all load balanced with HAproxy:
>
> Centos 5.8 (Virtualised)
> 2 Cores
> 2GB RAM
>
> I'm trying to replicate about 75K documents which total 6GB when compacte=
d (0n Couchdb 1.2 which has compression turned on). I'm told they are fairl=
y large documents.
>
> When it goes pear shaped Vsmstat starts using a lot of memory:
>
> procs -----------memory---------- ---swap-- -----io---- --system-- -----c=
pu------
> =A0r =A0b =A0 swpd =A0 free =A0 buff =A0cache =A0 si =A0 so =A0 =A0bi =A0=
 =A0bo =A0 in =A0 cs us sy id wa st
> =A01 =A02 570576 =A0 8808 =A0 =A0140 =A0 7208 2998 2249 =A03154 =A02249 1=
234 =A0569 =A01 =A06 =A02 91 =A00
> =A00 =A02 569656 =A0 9156 =A0 =A0156 =A0 7504 2330 1899 =A02405 =A01904 1=
246 =A0595 =A01 =A05 =A09 85 =A00
> =A01 =A01 575412 =A0 9516 =A0 =A0236 =A014928 1549 2261 =A03242 =A02261 1=
237 =A0593 =A01 =A07 =A01 91 =A00
> =A00 =A02 607092 =A013220 =A0 =A0168 =A0 8156 3772 9012 =A03871 =A09017 1=
284 =A0714 =A01 10 =A04 85 =A00
> =A01 =A00 444336 857004 =A0 =A0220 =A010212 5781 =A0 =A00 =A06202 =A0 =A0=
 0 1574 1010 13 =A07 33 47 =A00
> =A01 =A00 442176 870684 =A0 =A0428 =A011052 2049 =A0 =A00 =A02208 =A0 140=
 2561 1541 17 =A08 49 26 =A00
> =A00 =A00 442176 813140 =A0 =A0460 =A011968 =A0170 =A0 =A00 =A0 348 =A0 =
=A0 0 2672 1565 25 =A09 61 =A04 =A00
> =A00 =A01 442176 744972 =A0 =A0484 =A012224 5440 =A0 =A00 =A05493 =A0 =A0=
 7 2432 =A0900 =A08 =A04 49 40 =A00
> =A00 =A01 442176 714048 =A0 =A0484 =A012296 4547 =A0 =A00 =A04547 =A0 =A0=
 0 1799 =A0827 =A04 =A02 50 44 =A00
> =A00 =A01 442176 686304 =A0 =A0496 =A012688 5128 =A0 =A00 =A05222 =A0 =A0=
 0 1696 =A0999 =A09 =A02 50 40 =A00
> =A00 =A03 444000 =A0 8712 =A0 =A0444 =A012876 =A0299 =A0368 =A0 331 =A0 3=
80 1294 =A0188 22 20 36 23 =A00
> =A00 =A03 469340 =A010040 =A0 =A0116 =A0 7336 =A0 29 5087 =A0 =A074 =A050=
90 1232 =A0268 =A03 22 =A00 75 =A00
> =A01 =A02 584356 =A010220 =A0 =A0124 =A0 6744 11367 28722 11370 28722 164=
3 1300 =A05 19 17 59 =A00
> =A00 =A01 624908 =A010640 =A0 =A0132 =A0 7036 6518 12879 =A06590 12884 12=
96 =A0717 =A03 10 29 58 =A00
> =A00 =A02 652556 =A010948 =A0 =A0252 =A014776 3799 9494 =A05459 =A09494 1=
294 =A0646 =A02 =A09 32 57 =A00
> =A00 =A02 677784 =A010648 =A0 =A0244 =A014528 3819 8196 =A03819 =A08201 1=
274 =A0588 =A02 =A07 30 61 =A00
> =A00 =A02 688460 =A0 9512 =A0 =A0212 =A0 8224 3013 4522 =A03125 =A04522 1=
379 =A0519 =A02 =A07 =A06 84 =A00
> =A00 =A03 699164 =A0 9888 =A0 =A0208 =A0 8468 2192 4014 =A02228 =A04014 1=
302 =A0495 =A01 =A06 11 83 =A00
> =A02 =A00 713104 =A0 9004 =A0 =A0144 =A0 9192 2606 4490 =A02848 =A04490 1=
350 =A0487 =A01 =A08 16 75 =A00
>
> It only ever takes out one node at a time and the other nodes seem to be =
doing very little while the one node is running out of memory.
>
> If I kick it off again it processed some more and then spikes the memory =
and fails
>
> Thanks
>
> Mike
>
> PS: hope you enjoyed you couchdb get together!
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 12 April 2012 17:28
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> What kind of load were you putting the machine on?
>
> On 12 April 2012 17:24, Robert Newson <rnewson@apache.org> wrote:
>> Could you show your vm.args file?
>>
>> On 12 April 2012 17:23, Robert Newson <rnewson@apache.org> wrote:
>>> Unfortunately your request for help coincided with the two day CouchDB
>>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>> ways to get bigcouch support, but we happily answer queries here too,
>>> when not at the Model UN of CouchDB. :D
>>>
>>> B.
>>>
>>> On 12 April 2012 17:10, Mike Kimber <mkimber@kana.com> wrote:
>>>> Looks like this isn't the right place based on the responses so far. S=
hame I hoped this was going to help solve our index/view rebuild times etc.
>>>>
>>>> Mike
>>>>
>>>> -----Original Message-----
>>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>>> Sent: 10 April 2012 09:20
>>>> To: user@couchdb.apache.org
>>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>>
>>>> I'm not sure if this is the correct place to raise an issue I am havin=
g with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster?=
 If this is not the correct place please point me in the right direction if=
 it is then any one have any ideas why I keep getting the following error m=
essage when I kick of a replication;
>>>>
>>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap=
").
>>>>
>>>> My set-up is:
>>>>
>>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>>
>>>> 3 Node BigCouch cluster running on Centos 5.8 with the following local=
.ini overrides pulling from the Standalone couchdb (78K documents)
>>>>
>>>> [httpd]
>>>> bind_address =3D XXX.XX.X.XX
>>>>
>>>> [cluster]
>>>> ; number of shards for a new database
>>>> q =3D 9
>>>> ; number of copies of each shard
>>>> n =3D 1
>>>>
>>>> [couchdb]
>>>> database_dir =3D /other/bigcouch/database
>>>> view_index_dir =3D /other/bigcouch/view
>>>>
>>>> The error is always generate on the third node in the cluster and the =
server basically max's out on memory before hand. The other nodes seem to b=
e doing very little, but are getting data i.e. the shard sizes are growing.=
 I've put the copies per shard down to 1 as currently I'm not interested in=
 resilience.
>>>>
>>>> Any help would be greatly appreciated.
>>>>
>>>> Mike
>>>>