couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Query server perfromance issues ..
Date Wed, 23 Sep 2009 19:02:17 GMT
On Wed, Sep 23, 2009 at 11:27 AM, Debasish Ghosh
<ghosh.debasish@gmail.com> wrote:
> Looks like I found the problem .. is this an intended change in CouchDB ..
>
> CouchDB wiki documents the following for processing "reset" by the view server :
>
> """
> CouchDB sends:
> ["reset"]\n
> The view server responds:
> true\n
> """
> Accordingly I was doing a pattern match in my query server, expecting
> ["reset"] ..
>
> In the latest snapshot, I did a trace and found that CouchDB actually
> sends ["reset", {"reduce_limit":true}] .. hence I was getting an error
> and the query server closes every time ..
>
> Is this the current specification of "reset" ? I changed my code to do
> the corresponding pattern match .. and it now runs fine!
>
> Please confirm.
>

The query_server_spec.rb should cover this, but perhaps it doesn't. I
guess part of the issue is that JS is flexible about function arity in
a way that other languages are not, which makes it really easy to
absorb these sorts of differences.

The command is currently ["reset", query_server_options] although the
JS server works fine if it just gets ["reset"] as well.

Chris

> Thanks.
> - Debasish
>
> On Wed, Sep 23, 2009 at 12:07 AM, Debasish Ghosh
> <ghosh.debasish@gmail.com> wrote:
>> Thanks for the suggestions. I have not yet tried query_server_spec.rb.
>> Will do soon to check. Though I logged everything that goes between
>> couch server and the query server. The query server does get null from
>> readLine of System.in with the later snapshots of the codebase that
>> shuts it down. I need to investigate more on how it gets this. But as
>> I mentioned before as well, the same query server runs fine with the
>> earlier snapshot.
>>
>> Will let u know if I find anything meaningful.
>>
>> Thanks.
>> - Debasish
>>
>> On Tue, Sep 22, 2009 at 11:19 PM, Paul Davis
>> <paul.joseph.davis@gmail.com> wrote:
>>> On Tue, Sep 22, 2009 at 1:11 PM, Debasish Ghosh
>>> <ghosh.debasish@gmail.com> wrote:
>>>>> It may be that we're flushing the socket with no data, and the Scala
>>>>> server is interpreting that as null input. The JS client uses
>>>>> readline() implemented in C, so it shouldn't have access to data until
>>>>> a line break has been sent by CouchDB.
>>>>
>>>> readLine blocks .. right .. and only comes out with the null input.
>>>> The question is how it gets this null string with the new version of
>>>> CouchDB.
>>>> Is there something different that you were doing in earlier versions.
>>>> Just wondering how it still runs with an earlier snapshot of CouchDB
>>>> ..
>>>>
>>>> Thanks.
>>>> - Debasish
>>>>
>>>
>>> I'm still leaning towards the theory that your server is returning
>>> something that CouchDB doesn't expect. When this happens the Erlang
>>> process controller will shut down the view server by closing its input
>>> stream. Though, theoretically, couchspawnkillable should kill -KILL
>>> the process too, unless there's a tad bit of delay that occurs during
>>> which you're spinning over the stdin stream returning NULL.
>>>
>>> Did you ever try adding tests to query_server_spec.rb and running that
>>> way? I still need to modify that to make it more friendly to run
>>> external view engines, but with a bit of hacking it should at least
>>> point to the inconsistency.
>>>
>>> Paul Davis
>>>
>>>> On Mon, Sep 21, 2009 at 6:07 PM, Debasish Ghosh
>>>> <ghosh.debasish@gmail.com> wrote:
>>>>>
>>>>> The actual code is something like this ..
>>>>> var s = isr.readLine
>>>>> while (s != null) {
>>>>>     // do stuff
>>>>>     s = isr.readLine
>>>>> }
>>>>> I wrote the other version just to log what I get back. Now this same
version works ok with the earlier version of the couchdb server. That's what beats me here
..
>>>>> Thanks.
>>>>> - Debasish
>>>>>
>>>>> On Mon, Sep 21, 2009 at 5:46 PM, Robert Newson <robert.newson@gmail.com>
wrote:
>>>>>>
>>>>>> I claim you are ignoring null here because of your comment;
>>>>>>
>>>>>> while (true) {
>>>>>>  s = inputstreamreader.readLine
>>>>>>  if (s == null) // ignore
>>>>>>  else
>>>>>>  toJson(s) match {
>>>>>>   //.. process reset, add_fun etc.
>>>>>>  }
>>>>>> }
>>>>>>
>>>>>> When System.in is closed this loop will spin; readLine() will always
>>>>>> return null. Since System.in is only closed when the JVM is exiting,
>>>>>> it is never correct to ignore it and continue processing.
>>>>>>
>>>>>> The loop I presented is not the same as yours as mine will correctly
>>>>>> exit on process termination.
>>>>>>
>>>>>> readLine() *cannot* return null under any circumstance but the close
>>>>>> of the stream (couchdb cannot pass you null this way). System.in
is
>>>>>> never closed unless the process itself is exiting, and it is never
>>>>>> reopened.
>>>>>>
>>>>>> The mishandling of readLine() is probably hiding the real problem.
I
>>>>>> would guess you pass invalid JSON to couchdb, or fail to return
>>>>>> anything at all, under some conditions. Couch then kills your view
>>>>>> server (and would then restart it). The view server, rather than
>>>>>> gracefully exiting when this happens, will simple spin, never exiting.
>>>>>>
>>>>>> B.
>>>>>>
>>>>>> On Mon, Sep 21, 2009 at 8:19 AM, Debasish Ghosh
>>>>>> <ghosh.debasish@gmail.com> wrote:
>>>>>> > It's in fact referring to a reader that wraps System.in.
>>>>>> > readLine returns null on end of file, but the earlier version
of the
>>>>>> > snapshot handles it and does not close the query server process.
While the
>>>>>> > new server seems to get throttled in the while loop. In fact
this is one
>>>>>> > difference that I forgot to mention. In the earlier version
the query server
>>>>>> > does not close, while in the new version it gets closed and
restarted for
>>>>>> > every view operation. Maybe it's getting closed because of the
null. I can
>>>>>> > figure that out from the logs. Is this an intentional change
in
>>>>>> > implementation ?
>>>>>> > Robert -
>>>>>> > I am not ignoring null. The while loop is very similar to what
u mention. I
>>>>>> > switched to the while true version just to log and see if nulls
are getting
>>>>>> > returned.
>>>>>> > Thanks.
>>>>>> > - Debasish
>>>>>> >
>>>>>> > On Mon, Sep 21, 2009 at 3:53 AM, Paul Davis <paul.joseph.davis@gmail.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> On Sun, Sep 20, 2009 at 1:34 AM, Debasish Ghosh
>>>>>> >> <ghosh.debasish@gmail.com> wrote:
>>>>>> >> > Chris -
>>>>>> >> > In my query server code, I logged everything that gets
exchanged between
>>>>>> >> > the
>>>>>> >> > couchdb server process and the query server. The difference
that I
>>>>>> >> > noticed
>>>>>> >> > with the new changes are that the couchdb server sends
a huge number of
>>>>>> >> > null
>>>>>> >> > strings to the view server which chokes the latter.
In the snippet that
>>>>>> >> > I
>>>>>> >> > wrote before ..
>>>>>> >> >
>>>>>> >> > while (true) {
>>>>>> >> >>> >  s = inputstreamreader.readLine  //
this reads from stdin
>>>>>> >> >>> >  if (s == null) // ignore
>>>>>> >> >>> >  else
>>>>>> >> >>> >  toJson(s) match {
>>>>>> >> >>> >    //.. process reset, add_fun etc.
>>>>>> >> >>> >  }
>>>>>> >> >>> > }
>>>>>> >> >
>>>>>> >>
>>>>>> >> Does inputstreamreader.readLine refer to this function:
>>>>>> >>
>>>>>> >>
>>>>>> >> http://java.sun.com/j2se/1.5.0/docs/api/java/io/BufferedReader.html#readLine%28%29
>>>>>> >>
>>>>>> >> If so, and that's returning null, then is it signaling that
CouchDB
>>>>>> >> has tried to close the input stream?
>>>>>> >>
>>>>>> >> Paul
>>>>>> >>
>>>>>> >> > I put logs in the true branch of if (s == null) and
moments later I
>>>>>> >> > found a
>>>>>> >> > log created of size 10 MB where the view server gets
null strings from
>>>>>> >> > stdin. This may give some clues towards the problem.
>>>>>> >> >
>>>>>> >> > Hope this helps.
>>>>>> >> > - Debasish
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Sun, Sep 20, 2009 at 10:56 AM, Chris Anderson <jchris@apache.org>
>>>>>> >> > wrote:
>>>>>> >> >
>>>>>> >> >> On Sat, Sep 19, 2009 at 10:09 PM, Debasish Ghosh
>>>>>> >> >> <ghosh.debasish@gmail.com> wrote:
>>>>>> >> >> > Yes, actually the reason I brought it up is
that the same query
>>>>>> >> >> > server
>>>>>> >> >> runs
>>>>>> >> >> > fine with the earlier version, while it stumbles
with the changes
>>>>>> >> >> > incorporated later. Actually there is a really
really big difference
>>>>>> >> >> > in
>>>>>> >> >> > performance which is primarily because of
the timeouts. Thanks for
>>>>>> >> >> deciding
>>>>>> >> >> > to look into it. I will currently stick around
with the April
>>>>>> >> >> > snapshot.Please post your findings on this
list - I will be happy to
>>>>>> >> >> upgrade
>>>>>> >> >> > to the latest.
>>>>>> >> >> > Thanks.
>>>>>> >> >> > - Debasish
>>>>>> >> >>
>>>>>> >> >> I think what we'll need is a way to get visibility
between the beam
>>>>>> >> >> process and the query server. this could be accomplished
with a simple
>>>>>> >> >> log wrapper around the query server, logging both
stdin and stdout to
>>>>>> >> >> individual files.
>>>>>> >> >>
>>>>>> >> >> I like the idea of implementing it as a wrapper
because then we can
>>>>>> >> >> wrap it around the scala as well as the JS query
server (and other
>>>>>> >> >> languages), and get complete transparency into
what's going over the
>>>>>> >> >> wire.
>>>>>> >> >>
>>>>>> >> >> This is definitely turning into dev@ territory
so I'm moving this
>>>>>> >> >> thread
>>>>>> >> >> there.
>>>>>> >> >>
>>>>>> >> >> Chris
>>>>>> >> >>
>>>>>> >> >> >
>>>>>> >> >> > On Sun, Sep 20, 2009 at 3:41 AM, Chris Anderson
<jchris@apache.org>
>>>>>> >> >> wrote:
>>>>>> >> >> >
>>>>>> >> >> >> On Sat, Sep 19, 2009 at 11:40 AM, Debasish
Ghosh
>>>>>> >> >> >> <ghosh.debasish@gmail.com> wrote:
>>>>>> >> >> >> > Here are some additional behavior
changes that I am noticing
>>>>>> >> >> >> > between
>>>>>> >> >> the
>>>>>> >> >> >> 2
>>>>>> >> >> >> > versions ..
>>>>>> >> >> >>
>>>>>> >> >> >> The other big change is in couch_os_process,
the addition of
>>>>>> >> >> >> couchspawnkillable - maybe this is acting
up on your system.
>>>>>> >> >> >>
>>>>>> >> >> >> Partially I'm interested in getting to
the bottom of this because it
>>>>>> >> >> >> could be that it's inefficient with the
JS query server, but not
>>>>>> >> >> >> causing errors, and we just haven't noticed.
>>>>>> >> >> >>
>>>>>> >> >> >> > In the newer version, I notice lots
of null strings being sent
>>>>>> >> >> >> continuously
>>>>>> >> >> >> > from the couchdb server to the view
server. My view server loop
>>>>>> >> >> >> > looks
>>>>>> >> >> >> like
>>>>>> >> >> >> > the following :-
>>>>>> >> >> >> >
>>>>>> >> >> >> > while (true) {
>>>>>> >> >> >> >  s = inputstreamreader.readLine
>>>>>> >> >> >> >  toJson(s) match {
>>>>>> >> >> >> >    //.. process reset, add_fun
etc.
>>>>>> >> >> >> >  }
>>>>>> >> >> >> > }
>>>>>> >> >> >> >
>>>>>> >> >> >> > With the new version, I find lots
of null strings coming in to
>>>>>> >> >> >> > "s",
>>>>>> >> >> which
>>>>>> >> >> >> > makes me include something like the
following ..
>>>>>> >> >> >> >
>>>>>> >> >> >> > while (true) {
>>>>>> >> >> >> >  s = inputstreamreader.readLine
>>>>>> >> >> >> >  if (s == null) // ignore
>>>>>> >> >> >> >  else
>>>>>> >> >> >> >  toJson(s) match {
>>>>>> >> >> >> >    //.. process reset, add_fun
etc.
>>>>>> >> >> >> >  }
>>>>>> >> >> >> > }
>>>>>> >> >> >> >
>>>>>> >> >> >> > And this null business is really
huge. Has there been any change
>>>>>> >> >> >> > in
>>>>>> >> >> the
>>>>>> >> >> >> > protocol between the couchdb server
and the view server ? I
>>>>>> >> >> >> > suspect
>>>>>> >> >> that
>>>>>> >> >> >> > these null exchanges are taking up
lots of cycles which result in
>>>>>> >> >> process
>>>>>> >> >> >> > time out in the new version. I do
not get this null stuff with the
>>>>>> >> >> older
>>>>>> >> >> >> > version. Is there any chance of such
happening with the changes
>>>>>> >> >> >> > that
>>>>>> >> >> have
>>>>>> >> >> >> > been done in couch_query_servers.erl
?
>>>>>> >> >> >> >
>>>>>> >> >> >> > Thanks.
>>>>>> >> >> >> > - Debasish
>>>>>> >> >> >> >
>>>>>> >> >> >> >
>>>>>> >> >> >> > On Sat, Sep 19, 2009 at 11:34 PM,
Debasish Ghosh
>>>>>> >> >> >> > <ghosh.debasish@gmail.com>wrote:
>>>>>> >> >> >> >
>>>>>> >> >> >> >> actually my ["reset"] is not
expensive at all .. it just has a
>>>>>> >> >> >> array.clear
>>>>>> >> >> >> >> kind of call.
>>>>>> >> >> >> >> Just another observation when
I run in debug mode I find that
>>>>>> >> >> >> >> there
>>>>>> >> >> are
>>>>>> >> >> >> >> quite a few cases of OS Process
Error {os_process_error, "OS
>>>>>> >> >> >> >> process
>>>>>> >> >> >> timed
>>>>>> >> >> >> >> out."} being recorded in couch.log.
I do not get this when I am
>>>>>> >> >> running
>>>>>> >> >> >> the
>>>>>> >> >> >> >> earlier version. However no unnatural
things appear in
>>>>>> >> >> couchdb.stderr.
>>>>>> >> >> >> My
>>>>>> >> >> >> >> current setting of os_process_timeout
is 20000 .. I guess that's
>>>>>> >> >> >> >> 20
>>>>>> >> >> secs
>>>>>> >> >> >> ..
>>>>>> >> >> >> >>
>>>>>> >> >> >> >> Thanks.
>>>>>> >> >> >> >> - Debasish
>>>>>> >> >> >> >>
>>>>>> >> >> >> >>
>>>>>> >> >> >> >> On Sat, Sep 19, 2009 at 10:27
PM, Chris Anderson
>>>>>> >> >> >> >> <jchris@apache.org
>>>>>> >> >> >> >wrote:
>>>>>> >> >> >> >>
>>>>>> >> >> >> >>> On Sat, Sep 19, 2009 at 5:13
AM, Debasish Ghosh
>>>>>> >> >> >> >>> <ghosh.debasish@gmail.com>
wrote:
>>>>>> >> >> >> >>> > Hi -
>>>>>> >> >> >> >>> > As I have mentioned
previously I have been working on a Scala
>>>>>> >> >> driver
>>>>>> >> >> >> for
>>>>>> >> >> >> >>> > CouchDB, which also
includes a Query Server. I was working
>>>>>> >> >> >> >>> > with an
>>>>>> >> >> >> April
>>>>>> >> >> >> >>> > snapshot of 2009/04/23.
This worked fine for all the views and
>>>>>> >> >> >> >>> validations
>>>>>> >> >> >> >>> > that I have written.Things
were running fine and I could write
>>>>>> >> >> >> >>> map/reduce
>>>>>> >> >> >> >>> > and validation functions
in Scala.
>>>>>> >> >> >> >>> > Recently I tried to
upgrade to trunk. Suddenly the views and
>>>>>> >> >> >> validations
>>>>>> >> >> >> >>> > became very very slow.
After some fact finding, I tried to
>>>>>> >> >> >> >>> > poke
>>>>>> >> >> into
>>>>>> >> >> >> *
>>>>>> >> >> >> >>> > couch_query_servers.erl*,
since that seemed to be the obvious
>>>>>> >> >> >> >>> > area
>>>>>> >> >> to
>>>>>> >> >> >> >>> look
>>>>>> >> >> >> >>> > into. I may be worng
though, but it was a blind guess.
>>>>>> >> >> >> >>> > I noticed that previously
I was working with *revision 749852*
>>>>>> >> >> >> >>> > of
>>>>>> >> >> the
>>>>>> >> >> >> >>> file,
>>>>>> >> >> >> >>> > which delivered the
goods for me. Then when I faced problems
>>>>>> >> >> >> >>> > with
>>>>>> >> >> the
>>>>>> >> >> >> >>> trunk,
>>>>>> >> >> >> >>> > I started doing a git
reset to earlier versions of this file.
>>>>>> >> >> >> >>> > Now
>>>>>> >> >> I
>>>>>> >> >> >> find
>>>>>> >> >> >> >>> > that it looks like the
performance problem starts from
>>>>>> >> >> >> >>> > *revision
>>>>>> >> >> >> 780165*
>>>>>> >> >> >> >>> of
>>>>>> >> >> >> >>> > this file. Have a look
at
>>>>>> >> >> >> >>> >
>>>>>> >> >> >> >>>
>>>>>> >> >> >>
>>>>>> >> >>
>>>>>> >> >> http://svn.apache.org/viewvc/couchdb/trunk/src/couchdb/couch_query_servers.erl?r1=780165&r2=749852&diff_format=hfor
>>>>>> >> >> >> >>> > the difference. Looks
like there have been some major changes.
>>>>>> >> >> >> >>> > I
>>>>>> >> >> am
>>>>>> >> >> >> >>> > just
>>>>>> >> >> >> >>> > wondering if this change
has anything to do with the
>>>>>> >> >> >> >>> > performance
>>>>>> >> >> >> issue.
>>>>>> >> >> >> >>> >
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> A quick scan of that diff
suggests that the only real behavior
>>>>>> >> >> change
>>>>>> >> >> >> >>> that should effect you is
the ["reset"] call for recycled
>>>>>> >> >> >> >>> processes.
>>>>>> >> >> >> >>> Maybe reset is expensive
in your implementation?
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> BTW, have you tried running:
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> spec test/query_server_spec.rb
-f specdoc --color
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> It should be simple to extend
that test suite to test your scala
>>>>>> >> >> >> >>> server. If there are patches
we can make to make it easier to
>>>>>> >> >> >> >>> integrate outside projects
with the query server test suite, I'm
>>>>>> >> >> happy
>>>>>> >> >> >> >>> to help there as well.
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> > Any help, pointer will
be appreciated.
>>>>>> >> >> >> >>> >
>>>>>> >> >> >> >>> > Thanks.
>>>>>> >> >> >> >>> > - Debasish
>>>>>> >> >> >> >>> >
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>> --
>>>>>> >> >> >> >>> Chris Anderson
>>>>>> >> >> >> >>> http://jchrisa.net
>>>>>> >> >> >> >>> http://couch.io
>>>>>> >> >> >> >>>
>>>>>> >> >> >> >>
>>>>>> >> >> >> >>
>>>>>> >> >> >> >
>>>>>> >> >> >>
>>>>>> >> >> >>
>>>>>> >> >> >>
>>>>>> >> >> >> --
>>>>>> >> >> >> Chris Anderson
>>>>>> >> >> >> http://jchrisa.net
>>>>>> >> >> >> http://couch.io
>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Chris Anderson
>>>>>> >> >> http://jchrisa.net
>>>>>> >> >> http://couch.io
>>>>>> >> >>
>>>>>> >> >
>>>>>> >
>>>>>> >
>>>>>
>>>>
>>>
>>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message