Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69C02C138 for ; Fri, 11 May 2012 14:59:01 +0000 (UTC) Received: (qmail 43341 invoked by uid 500); 11 May 2012 14:59:00 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 43299 invoked by uid 500); 11 May 2012 14:59:00 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 43291 invoked by uid 99); 11 May 2012 14:59:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 14:59:00 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.161.180] (HELO mail-gg0-f180.google.com) (209.85.161.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 14:58:55 +0000 Received: by ggnf1 with SMTP id f1so1684133ggn.11 for ; Fri, 11 May 2012 07:58:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:from:date :message-id:subject:to:content-type:x-gm-message-state; bh=u3vnbKCZwHfqmZVIA6ffKEg5y29ecouqzoLF4WMx0EQ=; b=U9UpPLFD2j0A6a+x3yzJC/wvnFDC1IREygeoqGm4gYp8/NMnIDztuE3cUlM8yMvwk9 qrWM2cEssFidZKnaqb9eemENpkI+NpVy6l0+F/I3QLilY6HGzmFTNwzUxWhnPcM0ly7A eYz1rN4JcxG1ge4iIsjNELwXEutB1VCkn/3s2jaO8wWy6/mkeY9Gx0EsphrT8HHR8aMm YaZS6yOjgysn1nx0E42oCsHK2Hmm+bLOpfSAAJP1DVEbIbl6drTRFDhYMK5843PaQB/P ccbb0A923NLvc62z/aC3m6Td/448PJylalT8hAgIL8m1M3KhXU5cGNThIlm6PWn3EMB6 zuQw== Received: by 10.60.5.231 with SMTP id v7mr11769682oev.61.1336748314331; Fri, 11 May 2012 07:58:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.60.16.37 with HTTP; Fri, 11 May 2012 07:58:14 -0700 (PDT) X-Originating-IP: [188.82.193.216] In-Reply-To: References: From: Marco Monteiro Date: Fri, 11 May 2012 15:58:14 +0100 Message-ID: Subject: Re: CouchDB freezes To: dev@couchdb.apache.org Content-Type: multipart/alternative; boundary=e89a8ff253506e0a7704bfc3f879 X-Gm-Message-State: ALoCoQkSAoWES1MXV5Fb2d0pdHhJ2iI5UYziII1hKIjbq6aZLf3Mj/AGc4kVjmjxyINmaHod5xJo X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff253506e0a7704bfc3f879 Content-Type: text/plain; charset=ISO-8859-1 And that was 150 requests per second, of course, not per minute. Also, I tried my original test with about 200 requests per second and I think that the problem goes away (at least didn't see any problem for a couple of minutes.) Thanks, Marco On 11 May 2012 15:45, Marco Monteiro wrote: > Now the script really is attached. Promise. > > > On 11 May 2012 15:43, Marco Monteiro wrote: > >> I was trying nodeload but could not generate the load I need to trigger >> the problem. I attached the script. Can you tell me how to change the >> script >> to get to the load I need to trigger the problem? >> >> The attached script was making about 150 request per minute. >> >> Thanks, >> Marco >> >> >> On 11 May 2012 14:26, Robert Newson wrote: >> >>> Can you reproduce this behavior with other benchmarking tools? ab, >>> nodeload, etc? >>> >>> B. >>> >>> On 11 May 2012 14:18, Marco Monteiro wrote: >>> > Each node.js process had multiple concurrent requests. I just tried >>> with >>> > sequential requests and the problem persists. >>> > >>> > So, now I have 8 node.js processes each sending one write request only >>> > after the previous when is done. And the problem remains. >>> > >>> > The machine is not under any kind of huge load. Both top and iostat >>> report >>> > less than 10% machine use. The machines have 8 Core Xeon with 4 >>> > 10000 rpm hard disks in raid 10 and 16 Gb.of RAM. >>> > >>> > Note that I'm testing with less than 500 requests per second, at the >>> > moment. >>> > >>> > One more thing: when the problem happens, it's not that the database >>> becomes >>> > slow. It just drops the requests. And reads also fail. For example, >>> trying >>> > to >>> > use Futon I get a "connection was reset" message from firefox. >>> > >>> > This is on CouchDB 1.2. I'm going to try 1.1.1 next. >>> > >>> > Thanks, >>> > Marco >>> > >>> > On 11 May 2012 13:56, Robert Newson wrote: >>> > >>> >> Perhaps CouchDB on this particular hardware isn't fast enough to cope >>> >> with 4,000 writes per second? >>> >> >>> >> Does your node.js test send every update asynchronously or is it >>> >> carefully controlling qps? For what it's worth, I've benchmarked >>> >> successfully using a node.js library called nodeload >>> >> (https://github.com/benschmaus/nodeload). It's been a while since I >>> >> last used it, and node has changed a few dozen times since then, but >>> >> it was pretty solid and sane when I was using it. >>> >> >>> >> B. >>> >> >>> >> On 11 May 2012 13:48, Marco Monteiro wrote: >>> >> > Thanks, Robert. >>> >> > >>> >> > Disabling delayed commits did make the problem start later, but it >>> is >>> >> still >>> >> > there. >>> >> > >>> >> > It's funny that the first think that I checked when I first saw this >>> >> > problem was to >>> >> > make sure that delayed commits where enabled. >>> >> > >>> >> > Thanks, >>> >> > Marco >>> >> > >>> >> > On 11 May 2012 13:20, Robert Newson wrote: >>> >> > >>> >> >> The first thing is to ensure you have disabled delayed commits; >>> >> >> >>> >> >> curl -XPUT -d '"false" >>> localhost:5984/_config/couchdb/delayed_commits >>> >> >> >>> >> >> This is the production setting anyway (though not the default >>> because >>> >> >> of complaints from incompetent benchmarkers). This will ensure an >>> >> >> fsync for each write and, as a consequence, will greatly smooth >>> your >>> >> >> insert performance. Since you said you were inserting concurrently, >>> >> >> you should not experience a slowdown either. >>> >> >> >>> >> >> B. >>> >> >> >>> >> >> On 11 May 2012 02:42, Marco Monteiro >>> wrote: >>> >> >> > Hello! >>> >> >> > >>> >> >> > I'm running a load test on CouchDB. I have a cluster of 8 node.js >>> >> servers >>> >> >> > writing to >>> >> >> > CouchDB. They write about 30000 documents per minute (500 per >>> second). >>> >> >> > There are >>> >> >> > multiple concurrent requests form each server. There are no >>> updates: >>> >> >> > documents are >>> >> >> > created and not modified. >>> >> >> > >>> >> >> > I first tried CouchDB 1.1.1 from Debian 6.4 apt repo. After a few >>> >> >> minutes, >>> >> >> > CouchDB >>> >> >> > starts freezing for a period of 1 to 3 seconds about every 10 >>> >> seconds. It >>> >> >> > keeps this >>> >> >> > behaviour for some time and eventually it starts freezing more >>> >> frequently >>> >> >> > and for longer >>> >> >> > periods. When the database has about 1.5 million documents, >>> couchdb is >>> >> >> > freezing for >>> >> >> > more than 5 seconds each time. >>> >> >> > >>> >> >> > I then tried CouchDB 1.2, from build-couch. The freezes happen >>> with it >>> >> >> > also, but the >>> >> >> > behavior is much worse: in less than one minute it's freezing >>> for 5 >>> >> >> seconds >>> >> >> > or more, >>> >> >> > and it spends more time not doing anything than working. >>> >> >> > >>> >> >> > When testing with 1.1.1 I was writing only to one database. With >>> 1.2 I >>> >> >> > tried with one database >>> >> >> > and then with multiple databases but the problem was exactly the >>> same. >>> >> >> > >>> >> >> > The documents have about 10 properties, only numbers or string >>> and the >>> >> >> > strings are small >>> >> >> > (about 20 chars each). The document IDs are generated in the app >>> and >>> >> have >>> >> >> > the format >>> >> >> > >>> >> >> > - >>> >> >> > >>> >> >> > When CouchDB freezes, it's processor use (from top) goes to >>> zero. It >>> >> does >>> >> >> > not reply to read or write >>> >> >> > requests. The disk does not seem to be the problem as iostat >>> reports >>> >> >> near >>> >> >> > 0% utilization. >>> >> >> > CPU is mostly idle, and from the 16 GB of RAM, some of it is >>> free and >>> >> is >>> >> >> > not even used to >>> >> >> > cache disk. >>> >> >> > >>> >> >> > There are no error message in Couchdb log. >>> >> >> > >>> >> >> > I tried this in two different machines and the problem is the >>> same in >>> >> >> both. >>> >> >> > >>> >> >> > I did not change anything in the configuration files expect >>> changing >>> >> the >>> >> >> > database dir to use >>> >> >> > a RAID partition. >>> >> >> > >>> >> >> > Anyone has any idea of what the problem could be? >>> >> >> > >>> >> >> > Any help solving this issue is greatly appreciated. >>> >> >> > >>> >> >> > Thanks, >>> >> >> > Marco >>> >> >> >>> >> >>> >> >> > --e89a8ff253506e0a7704bfc3f879--