incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Stevens (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (COUCHDB-1817) Exceeding the couchjs stack size does not have a clear error message
Date Thu, 06 Jun 2013 17:49:21 GMT

     [ https://issues.apache.org/jira/browse/COUCHDB-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Stevens updated COUCHDB-1817:
---------------------------------

    Description: 
Updated description:

When a large document is uploaded (in this case 45MB of JSON), if the couchjs process runs
out of stack, the error message that gets produced does not make it easy to debug what is
going wrong.


We have started seeing errors crop up in our application that we have not seen before, and
we're at a loss for how to start debugging it.

[~dch] Said that we might look into system resource limits, so we started collecting all of
the output from _stats into RRD (along with memory, load, etc. that we were already collecting),
but nothing is jumping out at us as obviously problematic.

We can semi-reliably reproduce the problem, but it's far from a minimal test case (basically,
we load up several large chunks of data, and then halfway through the processing run, we get
the error).  The error doesn't seem to happen if we load up each chunk by itself.

The DB in question has about 100 docs in it, none particularly large (nothing over a couple
KB would be my guess), with a couple hundred MB in attachments.  10ish design docs, coffeescript.
 In general, there isn't anything that seems obviously resource intensive.

We have seen this issue on 1.2.0, 1.2.1, and we're working on getting a machine with 1.3.0
set up (the PPA we'd been using hasn't been updated yet).  Ubuntu 12.04, spinning disk, etc.
 The system is under load when it happens, but the load isn't more than 1.5x the number of
cores.  I don't have disk IO numbers at hand, but I'd be surprised if that was being strained.

Error as it appears in couch.log: https://gist.github.com/wickedgrey/e7fd3fc14b6d43e95564

The design doc in question: https://gist.github.com/wickedgrey/db41b0c3c75a590e2109

An example document: https://gist.github.com/wickedgrey/a8422aab261ddd2ce4fe

We have some preliminary evidence that the problem persists after the system goes quiet, but
we're not certain.

Either CouchDB isn't handling things correctly, in which case this bug is "prz fix" or we're
doing something wrong (hitting a resource limit, or something), in which case this bug is
"prz make the error message more informative".

Thanks!

  was:
We have started seeing errors crop up in our application that we have not seen before, and
we're at a loss for how to start debugging it.

[~dch] Said that we might look into system resource limits, so we started collecting all of
the output from _stats into RRD (along with memory, load, etc. that we were already collecting),
but nothing is jumping out at us as obviously problematic.

We can semi-reliably reproduce the problem, but it's far from a minimal test case (basically,
we load up several large chunks of data, and then halfway through the processing run, we get
the error).  The error doesn't seem to happen if we load up each chunk by itself.

The DB in question has about 100 docs in it, none particularly large (nothing over a couple
KB would be my guess), with a couple hundred MB in attachments.  10ish design docs, coffeescript.
 In general, there isn't anything that seems obviously resource intensive.

We have seen this issue on 1.2.0, 1.2.1, and we're working on getting a machine with 1.3.0
set up (the PPA we'd been using hasn't been updated yet).  Ubuntu 12.04, spinning disk, etc.
 The system is under load when it happens, but the load isn't more than 1.5x the number of
cores.  I don't have disk IO numbers at hand, but I'd be surprised if that was being strained.

Error as it appears in couch.log: https://gist.github.com/wickedgrey/e7fd3fc14b6d43e95564

The design doc in question: https://gist.github.com/wickedgrey/db41b0c3c75a590e2109

An example document: https://gist.github.com/wickedgrey/a8422aab261ddd2ce4fe

We have some preliminary evidence that the problem persists after the system goes quiet, but
we're not certain.

Either CouchDB isn't handling things correctly, in which case this bug is "prz fix" or we're
doing something wrong (hitting a resource limit, or something), in which case this bug is
"prz make the error message more informative".

Thanks!

    
> Exceeding the couchjs stack size does not have a clear error message
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-1817
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1817
>             Project: CouchDB
>          Issue Type: Bug
>          Components: JavaScript View Server
>            Reporter: Eli Stevens
>            Priority: Critical
>         Attachments: couchdb__couchdb_files.png, couchdb__httpd_status_codes.png, couchdb_mem.png,
loadavg.png, memory2.png
>
>
> Updated description:
> When a large document is uploaded (in this case 45MB of JSON), if the couchjs process
runs out of stack, the error message that gets produced does not make it easy to debug what
is going wrong.
> We have started seeing errors crop up in our application that we have not seen before,
and we're at a loss for how to start debugging it.
> [~dch] Said that we might look into system resource limits, so we started collecting
all of the output from _stats into RRD (along with memory, load, etc. that we were already
collecting), but nothing is jumping out at us as obviously problematic.
> We can semi-reliably reproduce the problem, but it's far from a minimal test case (basically,
we load up several large chunks of data, and then halfway through the processing run, we get
the error).  The error doesn't seem to happen if we load up each chunk by itself.
> The DB in question has about 100 docs in it, none particularly large (nothing over a
couple KB would be my guess), with a couple hundred MB in attachments.  10ish design docs,
coffeescript.  In general, there isn't anything that seems obviously resource intensive.
> We have seen this issue on 1.2.0, 1.2.1, and we're working on getting a machine with
1.3.0 set up (the PPA we'd been using hasn't been updated yet).  Ubuntu 12.04, spinning disk,
etc.  The system is under load when it happens, but the load isn't more than 1.5x the number
of cores.  I don't have disk IO numbers at hand, but I'd be surprised if that was being strained.
> Error as it appears in couch.log: https://gist.github.com/wickedgrey/e7fd3fc14b6d43e95564
> The design doc in question: https://gist.github.com/wickedgrey/db41b0c3c75a590e2109
> An example document: https://gist.github.com/wickedgrey/a8422aab261ddd2ce4fe
> We have some preliminary evidence that the problem persists after the system goes quiet,
but we're not certain.
> Either CouchDB isn't handling things correctly, in which case this bug is "prz fix" or
we're doing something wrong (hitting a resource limit, or something), in which case this bug
is "prz make the error message more informative".
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message