Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 13577 invoked from network); 23 Sep 2009 19:02:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Sep 2009 19:02:51 -0000 Received: (qmail 42318 invoked by uid 500); 23 Sep 2009 19:02:51 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 42245 invoked by uid 500); 23 Sep 2009 19:02:50 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 42235 invoked by uid 99); 23 Sep 2009 19:02:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Sep 2009 19:02:50 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jchris@gmail.com designates 209.85.222.198 as permitted sender) Received: from [209.85.222.198] (HELO mail-pz0-f198.google.com) (209.85.222.198) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Sep 2009 19:02:38 +0000 Received: by pzk36 with SMTP id 36so749410pzk.30 for ; Wed, 23 Sep 2009 12:02:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=azyxfatf31BZUGT/uOAUMhmeTrVfbmzLyYl1Ygl3bU8=; b=KF5kOuFBhTaxj6LZ0ANVYUI7stJYxV2HqXP0WdmX5Ss3LD4eWTrogydIwhngp43BDG 49K0DGVAn33+xE6ZZuTybXOG2zJ6i8D5IMyIswFmGuYNlYO/WkAMGbosJ8gTAVyj9v9d PSihFsL6071Io8plfcU61x5b62DNbKL01SdwA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=UF3BcCutQ3bi+KAUbAMNQc2X/k4N2J/d7ate9uTqC3bZF1n54H23deAAphqOGfi4Fb rjbj2dABesIo0oYdmB429BKFc6cliWhllkYLPzDZRjCBBV0EhPe/q2NuBbMGgVFbX0pr zwvWU+U31eY0D0jyIGjcsm/Tv/9mQ3xOtHFAQ= MIME-Version: 1.0 Sender: jchris@gmail.com Received: by 10.140.165.7 with SMTP id n7mr159703rve.182.1253732537169; Wed, 23 Sep 2009 12:02:17 -0700 (PDT) In-Reply-To: <8b1c89270909231127h46a33aaby817c35125e86e51c@mail.gmail.com> References: <8b1c89270909190513r1494523ew6a9cb363269c3f04@mail.gmail.com> <8b1c89270909192234g2ab4ba86u4e843f032a02efff@mail.gmail.com> <8b1c89270909210019n35c874e1udf31f1dd5896ca73@mail.gmail.com> <46aeb24f0909210516m519962ddsc67b9d23b3c8201c@mail.gmail.com> <8b1c89270909210537n4ad9a70rd4e0b5fd213f1231@mail.gmail.com> <8b1c89270909221011i3ab8150cnec997e64e9edc1c6@mail.gmail.com> <8b1c89270909221137n47b3f44bx676dd242d247c68e@mail.gmail.com> <8b1c89270909231127h46a33aaby817c35125e86e51c@mail.gmail.com> Date: Wed, 23 Sep 2009 12:02:17 -0700 X-Google-Sender-Auth: 2b1fcd6c72300536 Message-ID: Subject: Re: Query server perfromance issues .. From: Chris Anderson To: dghosh@acm.org, dev@couchdb.apache.org Cc: Paul Davis Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Sep 23, 2009 at 11:27 AM, Debasish Ghosh wrote: > Looks like I found the problem .. is this an intended change in CouchDB .= . > > CouchDB wiki documents the following for processing "reset" by the view s= erver : > > """ > CouchDB sends: > ["reset"]\n > The view server responds: > true\n > """ > Accordingly I was doing a pattern match in my query server, expecting > ["reset"] .. > > In the latest snapshot, I did a trace and found that CouchDB actually > sends ["reset", {"reduce_limit":true}] .. hence I was getting an error > and the query server closes every time .. > > Is this the current specification of "reset" ? I changed my code to do > the corresponding pattern match .. and it now runs fine! > > Please confirm. > The query_server_spec.rb should cover this, but perhaps it doesn't. I guess part of the issue is that JS is flexible about function arity in a way that other languages are not, which makes it really easy to absorb these sorts of differences. The command is currently ["reset", query_server_options] although the JS server works fine if it just gets ["reset"] as well. Chris > Thanks. > - Debasish > > On Wed, Sep 23, 2009 at 12:07 AM, Debasish Ghosh > wrote: >> Thanks for the suggestions. I have not yet tried query_server_spec.rb. >> Will do soon to check. Though I logged everything that goes between >> couch server and the query server. The query server does get null from >> readLine of System.in with the later snapshots of the codebase that >> shuts it down. I need to investigate more on how it gets this. But as >> I mentioned before as well, the same query server runs fine with the >> earlier snapshot. >> >> Will let u know if I find anything meaningful. >> >> Thanks. >> - Debasish >> >> On Tue, Sep 22, 2009 at 11:19 PM, Paul Davis >> wrote: >>> On Tue, Sep 22, 2009 at 1:11 PM, Debasish Ghosh >>> wrote: >>>>> It may be that we're flushing the socket with no data, and the Scala >>>>> server is interpreting that as null input. The JS client uses >>>>> readline() implemented in C, so it shouldn't have access to data unti= l >>>>> a line break has been sent by CouchDB. >>>> >>>> readLine blocks .. right .. and only comes out with the null input. >>>> The question is how it gets this null string with the new version of >>>> CouchDB. >>>> Is there something different that you were doing in earlier versions. >>>> Just wondering how it still runs with an earlier snapshot of CouchDB >>>> .. >>>> >>>> Thanks. >>>> - Debasish >>>> >>> >>> I'm still leaning towards the theory that your server is returning >>> something that CouchDB doesn't expect. When this happens the Erlang >>> process controller will shut down the view server by closing its input >>> stream. Though, theoretically, couchspawnkillable should kill -KILL >>> the process too, unless there's a tad bit of delay that occurs during >>> which you're spinning over the stdin stream returning NULL. >>> >>> Did you ever try adding tests to query_server_spec.rb and running that >>> way? I still need to modify that to make it more friendly to run >>> external view engines, but with a bit of hacking it should at least >>> point to the inconsistency. >>> >>> Paul Davis >>> >>>> On Mon, Sep 21, 2009 at 6:07 PM, Debasish Ghosh >>>> wrote: >>>>> >>>>> The actual code is something like this .. >>>>> var s =3D isr.readLine >>>>> while (s !=3D null) { >>>>> =A0=A0 =A0// do stuff >>>>> =A0=A0 =A0s =3D isr.readLine >>>>> } >>>>> I wrote the other version just to log what I get back. Now this same = version works ok with the earlier version of the couchdb server. That's wha= t beats me here .. >>>>> Thanks. >>>>> - Debasish >>>>> >>>>> On Mon, Sep 21, 2009 at 5:46 PM, Robert Newson wrote: >>>>>> >>>>>> I claim you are ignoring null here because of your comment; >>>>>> >>>>>> while (true) { >>>>>> =A0s =3D inputstreamreader.readLine >>>>>> =A0if (s =3D=3D null) // ignore >>>>>> =A0else >>>>>> =A0toJson(s) match { >>>>>> =A0 //.. process reset, add_fun etc. >>>>>> =A0} >>>>>> } >>>>>> >>>>>> When System.in is closed this loop will spin; readLine() will always >>>>>> return null. Since System.in is only closed when the JVM is exiting, >>>>>> it is never correct to ignore it and continue processing. >>>>>> >>>>>> The loop I presented is not the same as yours as mine will correctly >>>>>> exit on process termination. >>>>>> >>>>>> readLine() *cannot* return null under any circumstance but the close >>>>>> of the stream (couchdb cannot pass you null this way). System.in is >>>>>> never closed unless the process itself is exiting, and it is never >>>>>> reopened. >>>>>> >>>>>> The mishandling of readLine() is probably hiding the real problem. I >>>>>> would guess you pass invalid JSON to couchdb, or fail to return >>>>>> anything at all, under some conditions. Couch then kills your view >>>>>> server (and would then restart it). The view server, rather than >>>>>> gracefully exiting when this happens, will simple spin, never exitin= g. >>>>>> >>>>>> B. >>>>>> >>>>>> On Mon, Sep 21, 2009 at 8:19 AM, Debasish Ghosh >>>>>> wrote: >>>>>> > It's in fact referring to a reader that wraps System.in. >>>>>> > readLine returns null on end of file, but the earlier version of t= he >>>>>> > snapshot handles it and does not close the query server process. W= hile the >>>>>> > new server seems to get throttled in the while loop. In fact this = is one >>>>>> > difference that I forgot to=A0mention. In the earlier version the = query server >>>>>> > does not close, while in the new version it gets closed and restar= ted for >>>>>> > every view operation. Maybe it's getting closed because of the nul= l. I can >>>>>> > figure that out from the logs. Is this an intentional change in >>>>>> > implementation ? >>>>>> > Robert - >>>>>> > I am not ignoring null. The while loop is very similar to what u m= ention. I >>>>>> > switched to the while true version just to log and see if nulls ar= e getting >>>>>> > returned. >>>>>> > Thanks. >>>>>> > - Debasish >>>>>> > >>>>>> > On Mon, Sep 21, 2009 at 3:53 AM, Paul Davis >>>>>> > wrote: >>>>>> >> >>>>>> >> On Sun, Sep 20, 2009 at 1:34 AM, Debasish Ghosh >>>>>> >> wrote: >>>>>> >> > Chris - >>>>>> >> > In my query server code, I logged everything that gets exchange= d between >>>>>> >> > the >>>>>> >> > couchdb server process and the query server. The difference tha= t I >>>>>> >> > noticed >>>>>> >> > with the new changes are that the couchdb server sends a huge n= umber of >>>>>> >> > null >>>>>> >> > strings to the view server which chokes the latter. In the snip= pet that >>>>>> >> > I >>>>>> >> > wrote before .. >>>>>> >> > >>>>>> >> > while (true) { >>>>>> >> >>> > =A0s =3D inputstreamreader.readLine =A0// this reads from s= tdin >>>>>> >> >>> > =A0if (s =3D=3D null) // ignore >>>>>> >> >>> > =A0else >>>>>> >> >>> > =A0toJson(s) match { >>>>>> >> >>> > =A0 =A0//.. process reset, add_fun etc. >>>>>> >> >>> > =A0} >>>>>> >> >>> > } >>>>>> >> > >>>>>> >> >>>>>> >> Does inputstreamreader.readLine refer to this function: >>>>>> >> >>>>>> >> >>>>>> >> http://java.sun.com/j2se/1.5.0/docs/api/java/io/BufferedReader.ht= ml#readLine%28%29 >>>>>> >> >>>>>> >> If so, and that's returning null, then is it signaling that Couch= DB >>>>>> >> has tried to close the input stream? >>>>>> >> >>>>>> >> Paul >>>>>> >> >>>>>> >> > I put logs in the true branch of if (s =3D=3D null) and moments= later I >>>>>> >> > found a >>>>>> >> > log created of size 10 MB where the view server gets null strin= gs from >>>>>> >> > stdin. This may give some clues towards the problem. >>>>>> >> > >>>>>> >> > Hope this helps. >>>>>> >> > - Debasish >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > On Sun, Sep 20, 2009 at 10:56 AM, Chris Anderson >>>>>> >> > wrote: >>>>>> >> > >>>>>> >> >> On Sat, Sep 19, 2009 at 10:09 PM, Debasish Ghosh >>>>>> >> >> wrote: >>>>>> >> >> > Yes, actually the reason I brought it up is that the same qu= ery >>>>>> >> >> > server >>>>>> >> >> runs >>>>>> >> >> > fine with the earlier version, while it stumbles with the ch= anges >>>>>> >> >> > incorporated later. Actually there is a really really big di= fference >>>>>> >> >> > in >>>>>> >> >> > performance which is primarily because of the timeouts. Than= ks for >>>>>> >> >> deciding >>>>>> >> >> > to look into it. I will currently stick around with the Apri= l >>>>>> >> >> > snapshot.Please post your findings on this list - I will be = happy to >>>>>> >> >> upgrade >>>>>> >> >> > to the latest. >>>>>> >> >> > Thanks. >>>>>> >> >> > - Debasish >>>>>> >> >> >>>>>> >> >> I think what we'll need is a way to get visibility between the= beam >>>>>> >> >> process and the query server. this could be accomplished with = a simple >>>>>> >> >> log wrapper around the query server, logging both stdin and st= dout to >>>>>> >> >> individual files. >>>>>> >> >> >>>>>> >> >> I like the idea of implementing it as a wrapper because then w= e can >>>>>> >> >> wrap it around the scala as well as the JS query server (and o= ther >>>>>> >> >> languages), and get complete transparency into what's going ov= er the >>>>>> >> >> wire. >>>>>> >> >> >>>>>> >> >> This is definitely turning into dev@ territory so I'm moving t= his >>>>>> >> >> thread >>>>>> >> >> there. >>>>>> >> >> >>>>>> >> >> Chris >>>>>> >> >> >>>>>> >> >> > >>>>>> >> >> > On Sun, Sep 20, 2009 at 3:41 AM, Chris Anderson >>>>>> >> >> wrote: >>>>>> >> >> > >>>>>> >> >> >> On Sat, Sep 19, 2009 at 11:40 AM, Debasish Ghosh >>>>>> >> >> >> wrote: >>>>>> >> >> >> > Here are some additional behavior changes that I am notic= ing >>>>>> >> >> >> > between >>>>>> >> >> the >>>>>> >> >> >> 2 >>>>>> >> >> >> > versions .. >>>>>> >> >> >> >>>>>> >> >> >> The other big change is in couch_os_process, the addition o= f >>>>>> >> >> >> couchspawnkillable - maybe this is acting up on your system= . >>>>>> >> >> >> >>>>>> >> >> >> Partially I'm interested in getting to the bottom of this b= ecause it >>>>>> >> >> >> could be that it's inefficient with the JS query server, bu= t not >>>>>> >> >> >> causing errors, and we just haven't noticed. >>>>>> >> >> >> >>>>>> >> >> >> > In the newer version, I notice lots of null strings being= sent >>>>>> >> >> >> continuously >>>>>> >> >> >> > from the couchdb server to the view server. My view serve= r loop >>>>>> >> >> >> > looks >>>>>> >> >> >> like >>>>>> >> >> >> > the following :- >>>>>> >> >> >> > >>>>>> >> >> >> > while (true) { >>>>>> >> >> >> > =A0s =3D inputstreamreader.readLine >>>>>> >> >> >> > =A0toJson(s) match { >>>>>> >> >> >> > =A0 =A0//.. process reset, add_fun etc. >>>>>> >> >> >> > =A0} >>>>>> >> >> >> > } >>>>>> >> >> >> > >>>>>> >> >> >> > With the new version, I find lots of null strings coming = in to >>>>>> >> >> >> > "s", >>>>>> >> >> which >>>>>> >> >> >> > makes me include something like the following .. >>>>>> >> >> >> > >>>>>> >> >> >> > while (true) { >>>>>> >> >> >> > =A0s =3D inputstreamreader.readLine >>>>>> >> >> >> > =A0if (s =3D=3D null) // ignore >>>>>> >> >> >> > =A0else >>>>>> >> >> >> > =A0toJson(s) match { >>>>>> >> >> >> > =A0 =A0//.. process reset, add_fun etc. >>>>>> >> >> >> > =A0} >>>>>> >> >> >> > } >>>>>> >> >> >> > >>>>>> >> >> >> > And this null business is really huge. Has there been any= change >>>>>> >> >> >> > in >>>>>> >> >> the >>>>>> >> >> >> > protocol between the couchdb server and the view server ?= I >>>>>> >> >> >> > suspect >>>>>> >> >> that >>>>>> >> >> >> > these null exchanges are taking up lots of cycles which r= esult in >>>>>> >> >> process >>>>>> >> >> >> > time out in the new version. I do not get this null stuff= with the >>>>>> >> >> older >>>>>> >> >> >> > version. Is there any chance of such happening with the c= hanges >>>>>> >> >> >> > that >>>>>> >> >> have >>>>>> >> >> >> > been done in couch_query_servers.erl ? >>>>>> >> >> >> > >>>>>> >> >> >> > Thanks. >>>>>> >> >> >> > - Debasish >>>>>> >> >> >> > >>>>>> >> >> >> > >>>>>> >> >> >> > On Sat, Sep 19, 2009 at 11:34 PM, Debasish Ghosh >>>>>> >> >> >> > wrote: >>>>>> >> >> >> > >>>>>> >> >> >> >> actually my ["reset"] is not expensive at all .. it just= has a >>>>>> >> >> >> array.clear >>>>>> >> >> >> >> kind of call. >>>>>> >> >> >> >> Just another observation when I run in debug mode I find= that >>>>>> >> >> >> >> there >>>>>> >> >> are >>>>>> >> >> >> >> quite a few cases of OS Process Error {os_process_error,= "OS >>>>>> >> >> >> >> process >>>>>> >> >> >> timed >>>>>> >> >> >> >> out."} being recorded in couch.log. I do not get this wh= en I am >>>>>> >> >> running >>>>>> >> >> >> the >>>>>> >> >> >> >> earlier version. However no unnatural things appear in >>>>>> >> >> couchdb.stderr. >>>>>> >> >> >> My >>>>>> >> >> >> >> current setting of os_process_timeout is 20000 .. I gues= s that's >>>>>> >> >> >> >> 20 >>>>>> >> >> secs >>>>>> >> >> >> .. >>>>>> >> >> >> >> >>>>>> >> >> >> >> Thanks. >>>>>> >> >> >> >> - Debasish >>>>>> >> >> >> >> >>>>>> >> >> >> >> >>>>>> >> >> >> >> On Sat, Sep 19, 2009 at 10:27 PM, Chris Anderson >>>>>> >> >> >> >> >>>>> >> >> >> >wrote: >>>>>> >> >> >> >> >>>>>> >> >> >> >>> On Sat, Sep 19, 2009 at 5:13 AM, Debasish Ghosh >>>>>> >> >> >> >>> wrote: >>>>>> >> >> >> >>> > Hi - >>>>>> >> >> >> >>> > As I have mentioned previously I have been working on= a Scala >>>>>> >> >> driver >>>>>> >> >> >> for >>>>>> >> >> >> >>> > CouchDB, which also includes a Query Server. I was wo= rking >>>>>> >> >> >> >>> > with an >>>>>> >> >> >> April >>>>>> >> >> >> >>> > snapshot of 2009/04/23. This worked fine for all the = views and >>>>>> >> >> >> >>> validations >>>>>> >> >> >> >>> > that I have written.Things were running fine and I co= uld write >>>>>> >> >> >> >>> map/reduce >>>>>> >> >> >> >>> > and validation functions in Scala. >>>>>> >> >> >> >>> > Recently I tried to upgrade to trunk. Suddenly the vi= ews and >>>>>> >> >> >> validations >>>>>> >> >> >> >>> > became very very slow. After some fact finding, I tri= ed to >>>>>> >> >> >> >>> > poke >>>>>> >> >> into >>>>>> >> >> >> * >>>>>> >> >> >> >>> > couch_query_servers.erl*, since that seemed to be the= obvious >>>>>> >> >> >> >>> > area >>>>>> >> >> to >>>>>> >> >> >> >>> look >>>>>> >> >> >> >>> > into. I may be worng though, but it was a blind guess= . >>>>>> >> >> >> >>> > I noticed that previously I was working with *revisio= n 749852* >>>>>> >> >> >> >>> > of >>>>>> >> >> the >>>>>> >> >> >> >>> file, >>>>>> >> >> >> >>> > which delivered the goods for me. Then when I faced p= roblems >>>>>> >> >> >> >>> > with >>>>>> >> >> the >>>>>> >> >> >> >>> trunk, >>>>>> >> >> >> >>> > I started doing a git reset to earlier versions of th= is file. >>>>>> >> >> >> >>> > Now >>>>>> >> >> I >>>>>> >> >> >> find >>>>>> >> >> >> >>> > that it looks like the performance problem starts fro= m >>>>>> >> >> >> >>> > *revision >>>>>> >> >> >> 780165* >>>>>> >> >> >> >>> of >>>>>> >> >> >> >>> > this file. Have a look at >>>>>> >> >> >> >>> > >>>>>> >> >> >> >>> >>>>>> >> >> >> >>>>>> >> >> >>>>>> >> >> http://svn.apache.org/viewvc/couchdb/trunk/src/couchdb/couch_q= uery_servers.erl?r1=3D780165&r2=3D749852&diff_format=3Dhfor >>>>>> >> >> >> >>> > the difference. Looks like there have been some major= changes. >>>>>> >> >> >> >>> > I >>>>>> >> >> am >>>>>> >> >> >> >>> > just >>>>>> >> >> >> >>> > wondering if this change has anything to do with the >>>>>> >> >> >> >>> > performance >>>>>> >> >> >> issue. >>>>>> >> >> >> >>> > >>>>>> >> >> >> >>> >>>>>> >> >> >> >>> A quick scan of that diff suggests that the only real b= ehavior >>>>>> >> >> change >>>>>> >> >> >> >>> that should effect you is the ["reset"] call for recycl= ed >>>>>> >> >> >> >>> processes. >>>>>> >> >> >> >>> Maybe reset is expensive in your implementation? >>>>>> >> >> >> >>> >>>>>> >> >> >> >>> BTW, have you tried running: >>>>>> >> >> >> >>> >>>>>> >> >> >> >>> spec test/query_server_spec.rb -f specdoc --color >>>>>> >> >> >> >>> >>>>>> >> >> >> >>> It should be simple to extend that test suite to test y= our scala >>>>>> >> >> >> >>> server. If there are patches we can make to make it eas= ier to >>>>>> >> >> >> >>> integrate outside projects with the query server test s= uite, I'm >>>>>> >> >> happy >>>>>> >> >> >> >>> to help there as well. >>>>>> >> >> >> >>> >>>>>> >> >> >> >>> > Any help, pointer will be appreciated. >>>>>> >> >> >> >>> > >>>>>> >> >> >> >>> > Thanks. >>>>>> >> >> >> >>> > - Debasish >>>>>> >> >> >> >>> > >>>>>> >> >> >> >>> >>>>>> >> >> >> >>> >>>>>> >> >> >> >>> >>>>>> >> >> >> >>> -- >>>>>> >> >> >> >>> Chris Anderson >>>>>> >> >> >> >>> http://jchrisa.net >>>>>> >> >> >> >>> http://couch.io >>>>>> >> >> >> >>> >>>>>> >> >> >> >> >>>>>> >> >> >> >> >>>>>> >> >> >> > >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> -- >>>>>> >> >> >> Chris Anderson >>>>>> >> >> >> http://jchrisa.net >>>>>> >> >> >> http://couch.io >>>>>> >> >> >> >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> -- >>>>>> >> >> Chris Anderson >>>>>> >> >> http://jchrisa.net >>>>>> >> >> http://couch.io >>>>>> >> >> >>>>>> >> > >>>>>> > >>>>>> > >>>>> >>>> >>> >> > --=20 Chris Anderson http://jchrisa.net http://couch.io