incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: couchdb returning truncated JSON
Date Mon, 20 Aug 2012 16:04:30 GMT

You say in the other thread that oomkiller took out couchdb, which would obviously stop it
from sending more data. The TCP socket should close, though, and you should be able to detect
that.

I think you have http/1.0 and 1.1 confused. A 1.0 connection is supposed to always close after
the response (The "Connection: Keep-Alive" header for 1.0 is an off-spec hack), a 1.1 connection
is supposed to stay open for more requests unless asked to close ("Connection: close") by
either end.

B.


On 20 Aug 2012, at 16:56, Tim Tisdall wrote:

> I don't think this is the case, here's why...  If the process
> receiving the data died due to lack of resources, I don't think my php
> script would continue running and then try to parse the JSON into a
> PHP object.  Most likely it would just die.  I haven't had a chance
> yet, but I should try a few other options such as using http/1.1 which
> should explicitly close the connection (implying couchdb closed the
> connection before finishing the JSON).   However, I think it probably
> is resource related as smaller amounts seem to work with no problem,
> like you pointed out.
> 
> -Tim
> 
> On Mon, Aug 20, 2012 at 6:15 AM, CGS <cgsmcmlxxv@gmail.com> wrote:
>> Robert, inject a very large number docs in a database and try to access
>> _all_docs with a browser (the behavior is similar in a Linux terminal by
>> injecting the cURL output in an environment variable and, I suppose, in
>> every OS). You will see that the browser crashes (or becomes unresponsive)
>> when the RAM becomes insufficient. Accessing the same webpage via AJAX and
>> check it with firebug (or javascript console in GC), you will see the JSON
>> truncated (while the console reports error in JSON/webpage). Given the two
>> behaviors, under a controlled environment where a string is limited to a
>> number of characters, the received JSON is truncated (depending on the
>> environment, either the connection is closed and the string is reported as
>> it is, or the full environment is killed and then the response becomes
>> empty). That does not mean CouchDB is reporting it that way (cURL output
>> redirected to a file will show that the reported JSON is complete).
>> 
>> In my case, I need about 10 M docs to reproduce that behavior, but I have
>> few GB of RAM. In Tim's case, when he has ~380 MB RAM and he tries to
>> access 2.5 M rows, it seems he hit the maximum hardware capabilities of his
>> system.
>> 
>> Another indication that this is the part of his report in which he asks for
>> only 100 rows and that works without any problem.
>> 
>> Sure, there is never a bad idea to check the CouchDB source code for
>> errors. But I am not sure in this specific case you will find any.
>> 
>> CGS
>> 
>> 
>> 
>> 
>> On Sun, Aug 19, 2012 at 2:13 PM, Robert Newson <rnewson@apache.org> wrote:
>> 
>>> 
>>> 
>>> A full view response should always be valid JSON, so that does point
>>> towards a bug in CouchDB assuming the response output you posted is
>>> verbatim what CouchDB returned. Can you reproduce this reliably?
>>> 
>>> Oh, and unrelated to the bug itself, but R14A is a beta release, you
>>> should upgrade.
>>> 
>>> B.
>>> 
>>> On 19 Aug 2012, at 11:05, CGS wrote:
>>> 
>>>> I don't suppose the problem is coming from CouchDB, but from your
>>> external
>>>> environment which has a limited number of characters per line, truncating
>>>> the message if more. It happened to me in few occasions (Linux terminal
>>>> truncated my message because it was single line - line end was translated
>>>> as "\n" inside the message, so, no real line break was actually
>>> registered).
>>>> 
>>>> CGS
>>>> 
>>>> 
>>>> On Sun, Aug 19, 2012 at 1:46 AM, Tim Tisdall <tisdall@gmail.com> wrote:
>>>> 
>>>>> I have a script where I query a view for multiple entries of data.
>>>>> I'm doing it in batches of 1000.  It works fine multiple times and
>>>>> then suddenly it returns a result that doesn't properly parse as JSON
>>>>> because it's missing some content at the end (not sure how much, but
>>>>> it's at least missing the final bracket to make it complete).
>>>>> 
>>>>> My logs don't point out any problem...
>>>>> 
>>>>> [Sat, 18 Aug 2012 22:14:08 GMT] [debug] [<0.26768.3>] 'POST'
>>>>> /app_stats/_design/processing/_view/blanks {1,0} from "127.0.0.1"
>>>>> Headers: [{'Content-Length',"14010"},
>>>>>         {'Content-Type',"application/json"},
>>>>>         {'Host',"localhost"}]
>>>>> [Sat, 18 Aug 2012 22:14:08 GMT] [debug] [<0.26768.3>] OAuth Params:
[]
>>>>> [Sat, 18 Aug 2012 22:14:08 GMT] [debug] [<0.26768.3>] request_group
>>>>> {Pid, Seq} {<0.20450.3>,95240673}
>>>>> [Sat, 18 Aug 2012 22:14:08 GMT] [info] [<0.26768.3>] 127.0.0.1
- -
>>>>> POST /app_stats/_design/processing/_view/blanks 200
>>>>> 
>>>>> Here's what I received from couchdb:
>>>>> 
>>>>> HTTP/1.0 200 OK
>>>>> Server: CouchDB/1.2.0 (Erlang OTP/R14A)
>>>>> ETag: "B25M1ITCCF4RKMFE87QMQ1N3M"
>>>>> Date: Sat, 18 Aug 2012 22:22:10 GMT
>>>>> Content-Type: text/plain; charset=utf-8
>>>>> Cache-Control: must-revalidate
>>>>> 
>>>>> {"total_rows":14829491,"offset":12357523,"rows":[
>>>>> 
>>>>> 
>>> {"id":"34049664743","key":"34049664743","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>> 
>>>>> 
>>> {"id":"34049674790","key":"34049674790","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>> 
>>>>> 
>>> {"id":"34049683784","key":"34049683784","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>> 
>>>>> 
>>> {"id":"34049710675","key":"34049710675","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>>     [  **  SNIP **  ]
>>>>> 
>>>>> 
>>> {"id":"34082476762","key":"34082476762","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>> 
>>>>> 
>>> {"id":"34082494494","key":"34082494494","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>> 
>>>>> 
>>> {"id":"34082507402","key":"34082507402","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>> 
>>>>> 
>>> {"id":"34082533553","key":"34082533553","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>> 
>>>>> 
>>> {"id":"34082612840","key":"34082612840","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>> 
>>>>> 
>>> {"id":"34082621527","key":"34082621527","value":[{"start_date":"2012-08-05","end_date":null}]},
>>>>> 
>>>>> 
>>> {"id":"34082680993","key":"34082680993","value":[{"start_date":"2012-08-05","end_date":null}]}
>>>>> 
>>>>> it seems to consistently truncate at a point where the next character
>>>>> should either be another comma or a closing square bracket (to close
>>>>> the "rows" array).
>>>>> 
>>>>> I tried changing the script to do batches of 100 and it seems to be
>>>>> running without problems.  Shouldn't there be some sort of error,
>>>>> though?
>>>>> 
>>> 
>>> 


Mime
View raw message