couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustavo Delfino <Gustavo.Delf...@zf.com>
Subject RE: views failing due to fabric_worker_timeout and OS process timed out
Date Wed, 22 Feb 2017 23:30:17 GMT
Adam,

Thank you for your help. Setting [fabric] request_timeout to infinity solved the first problem.

In regards to the second problem, it remains. I did a study about the size of my documents.
I designed the db with documents as small as possible but a few of them are large. By doing
a HEAD request for all my documents, I extracted the content-length for all of them. So my
documents are like this:

Median size: 52k
Average size: 170k
3/4 of the documents are smaller than 229k
9/10 of the documents are smaller than 436k
Max size: 14000k

I have set [couchdb] os_process_timeout to 1000000 and still fails like this:

[info] 2017-02-22T23:16:22.533000Z couchdb@localhost <0.4854.995> -------- Starting
index update for db: shards/00000000-1fffffff/vw.1481754819 idx: _design/appname
[info] 2017-02-22T23:16:22.533000Z couchdb@localhost <0.1391.995> -------- Starting
index update for db: shards/40000000-5fffffff/vw.1481754819 idx: _design/appname
[info] 2017-02-22T23:16:22.533000Z couchdb@localhost <0.4588.995> -------- Starting
index update for db: shards/60000000-7fffffff/vw.1481754819 idx: _design/appname
[info] 2017-02-22T23:16:22.533000Z couchdb@localhost <0.3463.995> -------- Starting
index update for db: shards/80000000-9fffffff/vw.1481754819 idx: _design/appname
[info] 2017-02-22T23:16:22.533000Z couchdb@localhost <0.5295.995> -------- Starting
index update for db: shards/c0000000-dfffffff/vw.1481754819 idx: _design/appname
[info] 2017-02-22T23:16:22.533000Z couchdb@localhost <0.5037.995> -------- Starting
index update for db: shards/a0000000-bfffffff/vw.1481754819 idx: _design/appname
[info] 2017-02-22T23:16:22.533000Z couchdb@localhost <0.4141.984> -------- Starting
index update for db: shards/20000000-3fffffff/vw.1481754819 idx: _design/appname
(...)
[info] 2017-02-22T23:17:02.547000Z couchdb@localhost <0.212.0> -------- couch_proc_manager
<0.9875.4224> died normal
[error] 2017-02-22T23:17:03.069000Z couchdb@localhost <0.3010.4224> -------- OS Process
Error <0.9875.4224> :: {os_process_error,"OS process timed out."}
[error] 2017-02-22T23:17:03.078000Z couchdb@localhost emulator -------- Error in process <0.3010.4224>
on node 'couchdb@localhost' with exit value: {{nocatch,{os_process_error,"OS process timed
out."}},[{couch_os_process,prompt,2,[{file,"src/couch_os_process.erl"},{line,59}]},{couch_query_servers,map_doc_raw,2,[{file,"src/couch_query_servers.erl"},{line,67}]},{couch_mrview_updater...
[error] 2017-02-22T23:17:03.078000Z couchdb@localhost <0.5772.4224> 2fa00b7084 rexi_server
throw:{os_process_error,"OS process timed out."} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
[error] 2017-02-22T23:17:03.079000Z couchdb@localhost <0.7330.4224> 2fa00b7084 req_err(3183461804)
unknown_error : function_clause
    [<<"couch_mrview_show:list_cb/2 L212">>,<<"fabric_view_map:go/7 L54">>,<<"couch_query_servers:with_ddoc_proc/2
L421">>,<<"chttpd:process_request/1 L293">>,<<"chttpd:handle_request_int/1
L229">>,<<"mochiweb_http:headers/6 L122">>,<<"proc_lib:init_p_do_apply/3
L237">>]
[notice] 2017-02-22T23:17:03.079000Z couchdb@localhost <0.7330.4224> 2fa00b7084 servername:5984
10.217.46.47 undefined GET /vw/_design/appname/_list/signals/trw_id?start_key=%22VDP480001%22;end_key=%22VDP480001\u9999%22;
500 ok 40560
[info] 2017-02-22T23:17:03.592000Z couchdb@localhost <0.212.0> -------- couch_proc_manager
<0.29176.4223> died normal
[error] 2017-02-22T23:17:03.894000Z couchdb@localhost <0.6096.4224> -------- OS Process
Error <0.29176.4223> :: {os_process_error,"OS process timed out."}
[error] 2017-02-22T23:17:03.895000Z couchdb@localhost <0.6780.4224> 2fa00b7084 rexi_server
throw:{os_process_error,"OS process timed out."} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
[error] 2017-02-22T23:17:03.901000Z couchdb@localhost emulator -------- Error in process <0.6096.4224>
on node 'couchdb@localhost' with exit value: {{nocatch,{os_process_error,"OS process timed
out."}},[{couch_os_process,prompt,2,[{file,"src/couch_os_process.erl"},{line,59}]},{couch_query_servers,map_doc_raw,2,[{file,"src/couch_query_servers.erl"},{line,67}]},{couch_mrview_updater...
[info] 2017-02-22T23:17:06.366000Z couchdb@localhost <0.212.0> -------- couch_proc_manager
<0.7753.4224> died normal
[error] 2017-02-22T23:17:06.366000Z couchdb@localhost <0.4459.4224> -------- OS Process
Error <0.7753.4224> :: {os_process_error,"OS process timed out."}
[error] 2017-02-22T23:17:06.367000Z couchdb@localhost emulator -------- Error in process <0.4459.4224>
on node 'couchdb@localhost' with exit value: {{nocatch,{os_process_error,"OS process timed
out."}},[{couch_os_process,prompt,2,[{file,"src/couch_os_process.erl"},{line,59}]},{couch_query_servers,map_doc_raw,2,[{file,"src/couch_query_servers.erl"},{line,67}]},{couch_mrview_updater...
[error] 2017-02-22T23:17:06.367000Z couchdb@localhost <0.11272.4224> 2fa00b7084 rexi_server
throw:{os_process_error,"OS process timed out."} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
[info] 2017-02-22T23:17:08.245000Z couchdb@localhost <0.212.0> -------- couch_proc_manager
<0.8188.4224> died normal
[error] 2017-02-22T23:17:08.245000Z couchdb@localhost <0.28430.4223> -------- OS Process
Error <0.8188.4224> :: {os_process_error,"OS process timed out."}
[error] 2017-02-22T23:17:08.245000Z couchdb@localhost emulator -------- Error in process <0.28430.4223>
on node 'couchdb@localhost' with exit value: {{nocatch,{os_process_error,"OS process timed
out."}},[{couch_os_process,prompt,2,[{file,"src/couch_os_process.erl"},{line,59}]},{couch_query_servers,map_doc_raw,2,[{file,"src/couch_query_servers.erl"},{line,67}]},{couch_mrview_updater...
[error] 2017-02-22T23:17:08.319000Z couchdb@localhost <0.5275.4224> 2fa00b7084 rexi_server
throw:{os_process_error,"OS process timed out."} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
[notice] 2017-02-22T23:17:11.735000Z couchdb@localhost <0.18522.1766> 39ca2df11d servername:5984
149.223.224.36 undefined GET /vw/_changes?feed=continuous&style=all_docs&since=%22482896-g1AAAAJ7eJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8rgzmJgeF1by5QjD3NwDjFPNkEmx48JiUpAMkke4Rha8CGJVmkmSWam5NqmAPIsHiEYS0QlxlbpiUakmxYAsiweoRh-8CGWSSaWliaGpBoWB4LkGRoAFJA8-ZDDTwHNtDYzCI1McWCLAMXQAzcDzHwTRAk7IzNU5LSUsky8ADEwPtQF94FG5honmRsaUielx9ADISF4Q2IgYmpaWaWadi0ZgEA3hnNtQ%22&timeout=10000
200 ok 10090
[info] 2017-02-22T23:17:22.138000Z couchdb@localhost <0.212.0> -------- couch_proc_manager
<0.31614.4116> died normal
[error] 2017-02-22T23:17:22.139000Z couchdb@localhost <0.5831.4224> -------- OS Process
Error <0.31614.4116> :: {os_process_error,"OS process timed out."}
[error] 2017-02-22T23:17:22.148000Z couchdb@localhost emulator -------- Error in process <0.5831.4224>
on node 'couchdb@localhost' with exit value: {{nocatch,{os_process_error,"OS process timed
out."}},[{couch_os_process,prompt,2,[{file,"src/couch_os_process.erl"},{line,59}]},{couch_query_servers,map_doc_raw,2,[{file,"src/couch_query_servers.erl"},{line,67}]},{couch_mrview_updater...
[error] 2017-02-22T23:17:22.148000Z couchdb@localhost <0.4214.4224> 2fa00b7084 rexi_server
throw:{os_process_error,"OS process timed out."} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
[notice] 2017-02-22T23:17:22.226000Z couchdb@localhost <0.18522.1766> db25906ce1 servername:5984
149.223.224.36 undefined GET /vw/_changes?feed=continuous&style=all_docs&since=%22482896-g1AAAAJ7eJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8rgzmJgeF1by5QjD3NwDjFPNkEmx48JiUpAMkke4Rha8CGJVmkmSWam5NqmAPIsHiEYS0QlxlbpiUakmxYAsiweoRh-8CGWSSaWliaGpBoWB4LkGRoAFJA8-ZDDTwHNtDYzCI1McWCLAMXQAzcDzHwTRAk7IzNU5LSUsky8ADEwPtQF94FG5honmRsaUielx9ADISF4Q2IgYmpaWaWadi0ZgEA3hnNtQ%22&timeout=10000
200 ok 10482
[info] 2017-02-22T23:17:24.372000Z couchdb@localhost <0.212.0> -------- couch_proc_manager
<0.9283.4224> died normal
[error] 2017-02-22T23:17:24.372000Z couchdb@localhost <0.3599.4224> -------- OS Process
Error <0.9283.4224> :: {os_process_error,"OS process timed out."}
[error] 2017-02-22T23:17:24.391000Z couchdb@localhost emulator -------- Error in process <0.3599.4224>
on node 'couchdb@localhost' with exit value: {{nocatch,{os_process_error,"OS process timed
out."}},[{couch_os_process,prompt,2,[{file,"src/couch_os_process.erl"},{line,59}]},{couch_query_servers,map_doc_raw,2,[{file,"src/couch_query_servers.erl"},{line,67}]},{couch_mrview_updater...
[error] 2017-02-22T23:17:24.488000Z couchdb@localhost <0.8575.4224> 2fa00b7084 rexi_server
throw:{os_process_error,"OS process timed out."} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
[notice] 2017-02-22T23:17:24.964000Z couchdb@localhost <0.3439.4224> 21b9175dee servername:5984
149.223.224.36 undefined GET /vw/_changes?since=%22482896-g1AAAAJ7eJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8rgzmJgeF1by5QjD3NwDjFPNkEmx48JiUpAMkke4Rha8CGJVmkmSWam5NqmAPIsHiEYS0QlxlbpiUakmxYAsiweoRh-8CGWSSaWliaGpBoWB4LkGRoAFJA8-ZDDTwHNtDYzCI1McWCLAMXQAzcDzHwTRAk7IzNU5LSUsky8ADEwPtQF94FG5honmRsaUielx9ADISF4Q2IgYmpaWaWadi0ZgEA3hnNtQ%22&limit=0
200 ok 4
[info] 2017-02-22T23:17:25.307000Z couchdb@localhost <0.212.0> -------- couch_proc_manager
<0.4265.4224> died normal
[error] 2017-02-22T23:17:25.307000Z couchdb@localhost <0.25135.4223> -------- OS Process
Error <0.4265.4224> :: {os_process_error,"OS process timed out."}
[error] 2017-02-22T23:17:25.326000Z couchdb@localhost emulator -------- Error in process <0.25135.4223>
on node 'couchdb@localhost' with exit value: {{nocatch,{os_process_error,"OS process timed
out."}},[{couch_os_process,prompt,2,[{file,"src/couch_os_process.erl"},{line,59}]},{couch_query_servers,map_doc_raw,2,[{file,"src/couch_query_servers.erl"},{line,67}]},{couch_mrview_updater...
[error] 2017-02-22T23:17:25.328000Z couchdb@localhost <0.7179.4224> 2fa00b7084 rexi_server
throw:{os_process_error,"OS process timed out."} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]

Any order idea on what could be wrong?

Regards,

Gustavo Delfino



-----Original Message-----
From: Joan Touzet [mailto:wohali@apache.org] 
Sent: Wednesday, February 22, 2017 1:00 PM
To: user@couchdb.apache.org
Subject: Re: views failing due to fabric_worker_timeout and OS process timed out

> > Any idea of what could be going on? I am running CouchDB 2.0.0.1 
> > under Windows 7 with a single node. I have not modified most of the 
> > default CouchDB settings.
> 
> I haven’t kept pace with the current state of the art in the Windows 
> build, so there may be other platform-specific issues at work with 
> this “OS process timed out”. Cheers,

Not to my knowledge, other than somewhat poorer performance than the same machine running
Linux.

Bump the limits that Adam describes, or get smaller documents; if neither of those help please
let us know.

-Joan

-----Original Message-----
From: Adam Kocoloski [mailto:kocolosk@apache.org] 
Sent: Tuesday, February 21, 2017 8:50 PM
To: user@couchdb.apache.org
Subject: Re: views failing due to fabric_worker_timeout and OS process timed out 

Hi Gustavo, there are a couple of things going on here. Let’s address them individually:

> On Feb 21, 2017, at 6:17 PM, Gustavo Delfino <Gustavo.Delfino@zf.com> wrote:
> 
> Hi, I am evaluating using CouchDB and all worked well with a small test database. Now
I am trying to use it with a much larger database and I am having an issue creating views.
My view map function is very simple:
> 
> function (doc) {
>    var trw_id;
>    if(doc.customer_id){
>      emit(doc.customer_id, doc._id);
>    }
> }
> 
> With a few hundred documents it works well but not as the size of the db grows (or maybe
I have an issue with the function above).
> 
> I can see in the log how the shards start working:
> 
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29209.6> 
> -------- Starting index update for db: 
> shards/20000000-3fffffff/vw.1487715840 idx: _design/appname [info] 
> 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29194.6> -------- 
> Starting index update for db: shards/00000000-1fffffff/vw.1487715840 
> idx: _design/appname [info] 2017-02-21T22:38:58.786000Z 
> couchdb@localhost <0.29191.6> -------- Starting index update for db: 
> shards/60000000-7fffffff/vw.1487715840 idx: _design/appname [info] 
> 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29205.6> -------- 
> Starting index update for db: shards/80000000-9fffffff/vw.1487715840 
> idx: _design/appname [info] 2017-02-21T22:38:58.786000Z 
> couchdb@localhost <0.29218.6> -------- Starting index update for db: 
> shards/40000000-5fffffff/vw.1487715840 idx: _design/appname [info] 
> 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29228.6> -------- 
> Starting index update for db: shards/a0000000-bfffffff/vw.1487715840 
> idx: _design/appname [info] 2017-02-21T22:38:58.786000Z 
> couchdb@localhost <0.29225.6> -------- Starting index update for db: 
> shards/c0000000-dfffffff/vw.1487715840 idx: _design/appname [info] 
> 2017-02-21T22:38:58.788000Z couchdb@localhost <0.29208.6> -------- 
> Starting index update for db: shards/e0000000-ffffffff/vw.1487715840 
> idx: _design/appname
> 
> I see high CPU activity signaling that the index is being created and suddenly it stops:
> 
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> 
> d4985a33d1 fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/00000000-1fffffff/vw.1487715840">
> > [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> 
> d4985a33d1 fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/20000000-3fffffff/vw.1487715840">
> > [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> 
> d4985a33d1 fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/40000000-5fffffff/vw.1487715840">
> > [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> 
> d4985a33d1 fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/60000000-7fffffff/vw.1487715840">
> > [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> 
> d4985a33d1 fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/80000000-9fffffff/vw.1487715840">
> > [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> 
> d4985a33d1 fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/a0000000-bfffffff/vw.1487715840">
> > [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> 
> d4985a33d1 fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/c0000000-dfffffff/vw.1487715840">
> > [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> 
> d4985a33d1 fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/e0000000-ffffffff/vw.1487715840”>
> >

These timeouts are the expected behavior in 2.0 when a request for a view hits a configurable
limit. The default timeout is 60 seconds. I believe this may have been a change from 1.x where
the socket would sit open as long as necessary. If you need to recover that behavior you can
set

[fabric]
request_timeout = infinity

You could also configure some other number in milliseconds:

; Give up after 10 seconds
[fabric]
request_timeout = 10000 

In any case the indexing jobs should have continued even after this timeout. I think that’s
why the request worked when you reloaded the page.

> [error] 2017-02-21T22:39:59.000000Z couchdb@localhost <0.19734.6> d4985a33d1 req_err(1329706011)
unknown_error : function_clause
>    [<<"couch_mrview_show:list_cb/2 L212">>,<<"fabric_view_map:go/7

> L52">>,<<"couch_query_servers:with_ddoc_proc/2 
> L421">>,<<"chttpd:process_request/1 
> L293">>,<<"chttpd:handle_request_int/1 
> L229">>,<<"mochiweb_http:headers/6 
> L122">>,<<"proc_lib:init_p_do_apply/3 L237">>] [notice] 
> 2017-02-21T22:39:59.002000Z couchdb@localhost <0.19734.6> d4985a33d1 
> 127.0.0.1:5984 127.0.0.1 undefined GET 
> /dbname/_design/appname/_list/data/customer_id?key=%22PRIV-SE270_FC_AZ
> T10L16_016%22 500 ok 60218

Here the “60218” number is the response time in milliseconds, which confirms that you
bumped into the default timeout. However, you should have gotten something more informative
than this function_clause error response. That’s a bug in our error handling; if you like
I’d encourage you to file a bug report:

https://issues.apache.org/jira/browse/COUCHDB <https://issues.apache.org/jira/browse/COUCHDB>

You’ll need an account first if you don’t already have one:

https://issues.apache.org/jira/secure/Signup!default.jspa <https://issues.apache.org/jira/secure/Signup!default.jspa>

> In the web browser, I get this:
> 
> {"error":"unknown_error","reason":"function_clause","ref":1329706011}
> 
> When the error happened, also I was replicating from another CouchDB server that has
a large number of documents. I was running a test requesting the views as the db was getting
filled in to see at what point I started to get the issue. So I started seeing the issue with
about 17k documents (0.7GB).
> 
> I have just reloaded the page and it now works, but I have not been 
> able to make the view work on another machine with my complete DB 
> which is much bigger (1/2 million docs, 22GB)
> 
> This is what I see in the log in the machine with the large database:
> 
> [info] 2017-02-21T23:08:26.127000Z couchdb@localhost <0.4854.995> 
> -------- Starting index update for db: 
> shards/00000000-1fffffff/vw.1481754819 idx: _design/adag

<snip>

> [info] 2017-02-21T23:09:14.252000Z couchdb@localhost <0.212.0> 
> -------- couch_proc_manager <0.9345.4064> died normal [error] 
> 2017-02-21T23:09:14.253000Z couchdb@localhost <0.11021.4075> -------- 
> OS Process Error <0.9345.4064> :: {os_process_error,"OS process timed 
> out."} [error] 2017-02-21T23:09:14.376000Z couchdb@localhost emulator 
> -------- Error in process <0.11021.4075> on node 'couchdb@localhost' 
> with exit value: {{nocatch,{os_process_error,"OS process timed 
> out."}},[{couch_os_process,prompt,2,[{file,"src/couch_os_process.erl"}
> ,{line,59}]},{couch_query_servers,map_doc_raw,2,[{file,"src/couch_quer
> y_servers.erl"},{line,67}]},{couch_mrview_updater…

Now *this* is a separate issue. The “OS process timed out” error can be caused by a lot
of things, but one of the most common is a large JSON document. I’ve seen documents around,
say, 10 MB cause this timeout. Any chance you’ve got some of those hanging around? Again
this is a configurable value which defaults to 5 seconds:

; Allow the system 20 seconds to process a document in a view [couchdb] os_process_timeout
= 20000

At some point though this is a losing battle. Better to keep the documents under 1 MB if you
have that flexibility.

> Any idea of what could be going on? I am running CouchDB 2.0.0.1 under Windows 7 with
a single node. I have not modified most of the default CouchDB settings.

I haven’t kept pace with the current state of the art in the Windows build, so there may
be other platform-specific issues at work with this “OS process timed out”. Cheers,

Adam

> 
> Regards,
> 
> Gustavo Delfino
> 

Mime
View raw message