From dev-return-22221-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Fri May 11 14:45:54 2012 Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A67AC0ED for ; Fri, 11 May 2012 14:45:54 +0000 (UTC) Received: (qmail 85955 invoked by uid 500); 11 May 2012 14:45:53 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 85910 invoked by uid 500); 11 May 2012 14:45:53 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 85902 invoked by uid 99); 11 May 2012 14:45:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 14:45:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.213.52] (HELO mail-yw0-f52.google.com) (209.85.213.52) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 14:45:47 +0000 Received: by yhpp61 with SMTP id p61so3359283yhp.11 for ; Fri, 11 May 2012 07:45:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:from:date :message-id:subject:to:content-type:x-gm-message-state; bh=r2Ssx9sZeMQeVKcmOohU36sjhq+IZjrjOvGvof2Fozw=; b=GPdEo822NXik7CZrVUOJC/sRG/7Vx+c5y6paQx3OLIVB2NtquY1Wfh6NkzjYxGsBUO f9Agim1V4/4A3WrBRKRT6YwmhRGPfdDNkRAfPyhbOcAgPC3IEscpId2BtG1KBgtUpoN5 5urr0WLHSymglxWQDvx+iCsuIE6/jDEFjxDuDnEF3XQi+BZhNDNs+GOBIJ5+rTyw95Hv 2Et7K+XGb0UBFbjOrfqXMVwgbr7lRYl7QYXvw/qpj9G2zG1TIar6dTwPPc7xeb0JVcuw WzAYZNuTqBROv523d5yCUNLSQshw8rJj1Yv+XHc95d6vh9YihcvCxuNyUsmF254bNHFC Zu/g== Received: by 10.60.14.9 with SMTP id l9mr1080392oec.17.1336747526919; Fri, 11 May 2012 07:45:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.60.16.37 with HTTP; Fri, 11 May 2012 07:45:06 -0700 (PDT) X-Originating-IP: [188.82.193.216] In-Reply-To: References: From: Marco Monteiro Date: Fri, 11 May 2012 15:45:06 +0100 Message-ID: Subject: Re: CouchDB freezes To: dev@couchdb.apache.org Content-Type: multipart/mixed; boundary=e89a8fb1eef87f175d04bfc3c94b X-Gm-Message-State: ALoCoQmr20h0NQbavdt0v0PCgCo0SG2aiIM2r6rZsp2+WRUoBKsBBLL4Vmq+pfHpkOELBjJU6Sfe X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb1eef87f175d04bfc3c94b Content-Type: multipart/alternative; boundary=e89a8fb1eef87f175904bfc3c949 --e89a8fb1eef87f175904bfc3c949 Content-Type: text/plain; charset=ISO-8859-1 Now the script really is attached. Promise. On 11 May 2012 15:43, Marco Monteiro wrote: > I was trying nodeload but could not generate the load I need to trigger > the problem. I attached the script. Can you tell me how to change the > script > to get to the load I need to trigger the problem? > > The attached script was making about 150 request per minute. > > Thanks, > Marco > > > On 11 May 2012 14:26, Robert Newson wrote: > >> Can you reproduce this behavior with other benchmarking tools? ab, >> nodeload, etc? >> >> B. >> >> On 11 May 2012 14:18, Marco Monteiro wrote: >> > Each node.js process had multiple concurrent requests. I just tried with >> > sequential requests and the problem persists. >> > >> > So, now I have 8 node.js processes each sending one write request only >> > after the previous when is done. And the problem remains. >> > >> > The machine is not under any kind of huge load. Both top and iostat >> report >> > less than 10% machine use. The machines have 8 Core Xeon with 4 >> > 10000 rpm hard disks in raid 10 and 16 Gb.of RAM. >> > >> > Note that I'm testing with less than 500 requests per second, at the >> > moment. >> > >> > One more thing: when the problem happens, it's not that the database >> becomes >> > slow. It just drops the requests. And reads also fail. For example, >> trying >> > to >> > use Futon I get a "connection was reset" message from firefox. >> > >> > This is on CouchDB 1.2. I'm going to try 1.1.1 next. >> > >> > Thanks, >> > Marco >> > >> > On 11 May 2012 13:56, Robert Newson wrote: >> > >> >> Perhaps CouchDB on this particular hardware isn't fast enough to cope >> >> with 4,000 writes per second? >> >> >> >> Does your node.js test send every update asynchronously or is it >> >> carefully controlling qps? For what it's worth, I've benchmarked >> >> successfully using a node.js library called nodeload >> >> (https://github.com/benschmaus/nodeload). It's been a while since I >> >> last used it, and node has changed a few dozen times since then, but >> >> it was pretty solid and sane when I was using it. >> >> >> >> B. >> >> >> >> On 11 May 2012 13:48, Marco Monteiro wrote: >> >> > Thanks, Robert. >> >> > >> >> > Disabling delayed commits did make the problem start later, but it is >> >> still >> >> > there. >> >> > >> >> > It's funny that the first think that I checked when I first saw this >> >> > problem was to >> >> > make sure that delayed commits where enabled. >> >> > >> >> > Thanks, >> >> > Marco >> >> > >> >> > On 11 May 2012 13:20, Robert Newson wrote: >> >> > >> >> >> The first thing is to ensure you have disabled delayed commits; >> >> >> >> >> >> curl -XPUT -d '"false" >> localhost:5984/_config/couchdb/delayed_commits >> >> >> >> >> >> This is the production setting anyway (though not the default >> because >> >> >> of complaints from incompetent benchmarkers). This will ensure an >> >> >> fsync for each write and, as a consequence, will greatly smooth your >> >> >> insert performance. Since you said you were inserting concurrently, >> >> >> you should not experience a slowdown either. >> >> >> >> >> >> B. >> >> >> >> >> >> On 11 May 2012 02:42, Marco Monteiro >> wrote: >> >> >> > Hello! >> >> >> > >> >> >> > I'm running a load test on CouchDB. I have a cluster of 8 node.js >> >> servers >> >> >> > writing to >> >> >> > CouchDB. They write about 30000 documents per minute (500 per >> second). >> >> >> > There are >> >> >> > multiple concurrent requests form each server. There are no >> updates: >> >> >> > documents are >> >> >> > created and not modified. >> >> >> > >> >> >> > I first tried CouchDB 1.1.1 from Debian 6.4 apt repo. After a few >> >> >> minutes, >> >> >> > CouchDB >> >> >> > starts freezing for a period of 1 to 3 seconds about every 10 >> >> seconds. It >> >> >> > keeps this >> >> >> > behaviour for some time and eventually it starts freezing more >> >> frequently >> >> >> > and for longer >> >> >> > periods. When the database has about 1.5 million documents, >> couchdb is >> >> >> > freezing for >> >> >> > more than 5 seconds each time. >> >> >> > >> >> >> > I then tried CouchDB 1.2, from build-couch. The freezes happen >> with it >> >> >> > also, but the >> >> >> > behavior is much worse: in less than one minute it's freezing for >> 5 >> >> >> seconds >> >> >> > or more, >> >> >> > and it spends more time not doing anything than working. >> >> >> > >> >> >> > When testing with 1.1.1 I was writing only to one database. With >> 1.2 I >> >> >> > tried with one database >> >> >> > and then with multiple databases but the problem was exactly the >> same. >> >> >> > >> >> >> > The documents have about 10 properties, only numbers or string >> and the >> >> >> > strings are small >> >> >> > (about 20 chars each). The document IDs are generated in the app >> and >> >> have >> >> >> > the format >> >> >> > >> >> >> > - >> >> >> > >> >> >> > When CouchDB freezes, it's processor use (from top) goes to zero. >> It >> >> does >> >> >> > not reply to read or write >> >> >> > requests. The disk does not seem to be the problem as iostat >> reports >> >> >> near >> >> >> > 0% utilization. >> >> >> > CPU is mostly idle, and from the 16 GB of RAM, some of it is free >> and >> >> is >> >> >> > not even used to >> >> >> > cache disk. >> >> >> > >> >> >> > There are no error message in Couchdb log. >> >> >> > >> >> >> > I tried this in two different machines and the problem is the >> same in >> >> >> both. >> >> >> > >> >> >> > I did not change anything in the configuration files expect >> changing >> >> the >> >> >> > database dir to use >> >> >> > a RAID partition. >> >> >> > >> >> >> > Anyone has any idea of what the problem could be? >> >> >> > >> >> >> > Any help solving this issue is greatly appreciated. >> >> >> > >> >> >> > Thanks, >> >> >> > Marco >> >> >> >> >> >> > > --e89a8fb1eef87f175904bfc3c949 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Now the script really is attached. Promise.

On 11 May 2012 15:43, Marco Monteiro <marco@textovirtual.com&g= t; wrote:
I was trying nodeload but could not generate= the load I need to trigger
the problem. I attached the script. Can you = tell me how to change the script
to get to the load I need to trigger the problem?

The attached scrip= t was making about 150 request per minute.

Thanks,
Marco


On 11 May 2012 14:26, Robert Newson &= lt;rnewson@apache.o= rg> wrote:
Can you reproduce this behavior with other benchmarking tools? ab,
nodeload, etc?

B.

On 11 May 2012 14:18, Marco Monteiro <marco@textovirtual.com> wrote:
> Each node.js process had multiple concurrent requests. I just tried wi= th
> sequential requests and the problem persists.
>
> So, now I have 8 node.js processes each sending one write request only=
> after the previous when is done. And the problem remains.
>
> The machine is not under any kind of =A0huge load. Both top and iostat= report
> less than 10% machine use. The machines have 8 Core Xeon with 4
> 10000 rpm hard disks in raid 10 and 16 Gb.of RAM.
>
> Note that I'm testing with less than 500 requests per second, at t= he
> moment.
>
> One more thing: when the problem happens, it's not that the databa= se becomes
> slow. It just drops the requests. And reads also fail. For example, tr= ying
> to
> use Futon I get a "connection was reset" message from firefo= x.
>
> This is on CouchDB 1.2. I'm going to try 1.1.1 next.
>
> Thanks,
> Marco
>
> On 11 May 2012 13:56, Robert Newson <rnewson@apache.org> wrote:
>
>> Perhaps CouchDB on this particular hardware isn't fast enough = to cope
>> with 4,000 writes per second?
>>
>> Does your node.js test send every update asynchronously or is it >> carefully controlling qps? For what it's worth, I've bench= marked
>> successfully using a node.js library called nodeload
>> (https://github.com/benschmaus/nodeload). It's been a while sinc= e I
>> last used it, and node has changed a few dozen times since then, b= ut
>> it was pretty solid and sane when I was using it.
>>
>> B.
>>
>> On 11 May 2012 13:48, Marco Monteiro <marco@textovirtual.com> wrote: >> > Thanks, Robert.
>> >
>> > Disabling delayed commits did make the problem start later, b= ut it is
>> still
>> > there.
>> >
>> > It's funny that the first think that I checked when I fir= st saw this
>> > problem was to
>> > make sure that delayed commits where enabled.
>> >
>> > Thanks,
>> > Marco
>> >
>> > On 11 May 2012 13:20, Robert Newson <rnewson@apache.org> wrote:
>> >
>> >> The first thing is to ensure you have disabled delayed co= mmits;
>> >>
>> >> curl -XPUT -d '"false" localhost:5984/_conf= ig/couchdb/delayed_commits
>> >>
>> >> This is the production setting anyway (though not the def= ault because
>> >> of complaints from incompetent benchmarkers). This will e= nsure an
>> >> fsync for each write and, as a consequence, will greatly = smooth your
>> >> insert performance. Since you said you were inserting con= currently,
>> >> you should not experience a slowdown either.
>> >>
>> >> B.
>> >>
>> >> On 11 May 2012 02:42, Marco Monteiro <marco@textovirtual.com> = wrote:
>> >> > Hello!
>> >> >
>> >> > I'm running a load test on CouchDB. I have a clu= ster of 8 node.js
>> servers
>> >> > writing to
>> >> > CouchDB. They write about 30000 documents per minute= (500 per second).
>> >> > There are
>> >> > multiple concurrent requests form each server. There= are no updates:
>> >> > documents are
>> >> > created and not modified.
>> >> >
>> >> > I first tried CouchDB 1.1.1 from Debian 6.4 apt repo= . After a few
>> >> minutes,
>> >> > CouchDB
>> >> > starts freezing for a period of 1 to 3 seconds about= every 10
>> seconds. It
>> >> > keeps this
>> >> > behaviour for some time and eventually it starts fre= ezing more
>> frequently
>> >> > and for longer
>> >> > periods. When the database has about 1.5 million doc= uments, couchdb is
>> >> > freezing for
>> >> > more than 5 seconds each time.
>> >> >
>> >> > I then tried CouchDB 1.2, from build-couch. The free= zes happen with it
>> >> > also, but the
>> >> > behavior is much worse: in less than one minute it&#= 39;s freezing for 5
>> >> seconds
>> >> > or more,
>> >> > and it spends more time not doing anything than work= ing.
>> >> >
>> >> > When testing with 1.1.1 I was writing only to one da= tabase. With 1.2 I
>> >> > tried with one database
>> >> > and then with multiple databases but the problem was= exactly the same.
>> >> >
>> >> > The documents have about 10 properties, only numbers= or string and the
>> >> > strings are small
>> >> > (about 20 chars each). The document IDs are generate= d in the app and
>> have
>> >> > the format
>> >> >
>> >> > =A0<milliseconds since epoch>-<random 16 ch= ars string>
>> >> >
>> >> > When CouchDB freezes, it's processor use (from t= op) goes to zero. It
>> does
>> >> > not reply to read or write
>> >> > requests. The disk does not seem to be the =A0proble= m as iostat reports
>> >> near
>> >> > 0% utilization.
>> >> > CPU is mostly idle, and from the 16 GB of RAM, some = of it is free and
>> is
>> >> > not even used to
>> >> > cache disk.
>> >> >
>> >> > There are no error message in Couchdb log.
>> >> >
>> >> > I tried this in two different machines and the probl= em is the same in
>> >> both.
>> >> >
>> >> > I did not change anything in the configuration files= expect changing
>> the
>> >> > database dir to use
>> >> > a RAID partition.
>> >> >
>> >> > Anyone has any idea of what the problem could be? >> >> >
>> >> > Any help solving this issue is greatly appreciated.<= br> >> >> >
>> >> > Thanks,
>> >> > Marco
>> >>
>>


--e89a8fb1eef87f175904bfc3c949-- --e89a8fb1eef87f175d04bfc3c94b Content-Type: application/x-javascript; name="nodeload.js" Content-Disposition: attachment; filename="nodeload.js" Content-Transfer-Encoding: base64 X-Attachment-Id: f_h23cxpv10 dmFyIG5sID0gcmVxdWlyZSgnbm9kZWxvYWQnKTsKdmFyIGxvYWR0ZXN0ID0gbmwucnVuKHsKICAg IGhvc3Q6ICdsb2NhbGhvc3QnLAogICAgcG9ydDogNTk4NCwKICAgIHRpbWVMaW1pdDogMTIwLAog ICAgdGFyZ2V0UnBzOiA1MDAsCiAgICBudW1Vc2VyczogOCwKICAgIHJlcXVlc3RMb29wOiBmdW5j dGlvbiAoZmluaXNoZWQsIGNsaWVudCkgewogICAgICAgIHZhciByZXEgPSBjbGllbnQucmVxdWVz dCgnUE9TVCcsICIvbm8tdXBkYXRlIiwgeyJDb250ZW50LVR5cGUiOiAiYXBwbGljYXRpb24vanNv biJ9KTsKICAgICAgICByZXEub24oInJlc3BvbnNlIiwgZnVuY3Rpb24gKHJlcykgewogICAgICAg ICAgICBmaW5pc2hlZCh7cmVxOiByZXEsIHJlczogcmVzfSk7CiAgICAgICAgfSk7CiAgICAgICAg dmFyIG5ld0lkID0gbmV3SUQoKTsKICAgICAgICB2YXIgaWQgPSBndWlkKCk7CiAgICAgICAgcmVx LmVuZChKU09OLnN0cmluZ2lmeSh7CiAgICAgICAgICAgIF9pZDogbmV3SUQoKSwKICAgICAgICAg ICAgc2lkOiBpZCwKICAgICAgICAgICAgcGlkOiBpZCwKICAgICAgICAgICAgdGltZTogRGF0ZS5u b3coKSwKICAgICAgICAgICAgdWE6ICJ0ZXN0IiwKICAgICAgICAgICAgdWFfc3RyOiAiYXNka2Nh c2prZG5ja2FzZGxjYXNuZGxja25hc2Rsa2Nhc2xkY25ha2xzZG5jbGFzamtkbmNsYWtzZG5jbGth bnNkY2prbGFzbmRsY2tqYW5kYyIsCgogICAgICAgICAgICB0eXBlOiAiYWRsa2ZtYWFzZGNzbGtk bWYiLAoKICAgICAgICAgICAgZGF0YTogeyBzdHI6ICJhc2RrY2FzamtkbmNrYXNkbGNhc25kbGNr bmFzZGxrY2FzbGRjbmFrbHNkbmNsYXNqa2RuY2xha3NkbmNsa2Fuc2RjamtsYXNuZGxja2phbmRj IiB9LAogICAgICAgICAgICBwYWdlOiAibWFpc3VtYXN0cmluZyIsCiAgICAgICAgICAgIHVpZDog aWQsCgogICAgICAgICAgICBpcDogIjEyNy4wLjAuMSIsCgogICAgICAgICAgICBpZDogaWQsCgog ICAgICAgICAgICBwYWdlX3R5cGU6ICJhc2Rma2xhbmRrbGFmbnNsZGtmbiIsCiAgICAgICAgICAg IHJlZmVycmVyOiAiYXNka2Nhc2prZG5ja2FzZGxjYXNuZGxja25hc2Rsa2Nhc2xkY25ha2xzZG5j bGFzamtkbmNsYWtzZG5jbGthbnNkY2prbGFzbmRsY2tqYW5kYyIsCiAgICAgICAgICAgIHdpZHRo OiAxMDAwLAogICAgICAgICAgICBoZWlnaHQ6IDEwMDAKICAgICAgICB9KSk7CiAgICB9Cn0pOwps b2FkdGVzdC5vbignZW5kJywgZnVuY3Rpb24oKSB7IGNvbnNvbGUubG9nKCdMb2FkIHRlc3QgZG9u ZS4nKTsgfSk7CgpmdW5jdGlvbiBuZXdJRCAoKSB7CiAgICByZXR1cm4gRGF0ZS5ub3coKSArICIt IiArIGd1aWQoKTsKfQoKZnVuY3Rpb24gZ3VpZCgpIHsKICAgIHJldHVybiAoUzQoKStTNCgpK1M0 KCkrUzQoKSk7CiAgICBmdW5jdGlvbiBTNCAoKSB7CiAgICAgICAgcmV0dXJuICgoKDErTWF0aC5y YW5kb20oKSkqMHgxMDAwMCl8MCkudG9TdHJpbmcoMTYpLnN1YnN0cmluZygxKTsKICAgIH07Cn07 Cg== --e89a8fb1eef87f175d04bfc3c94b--