couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos Alonso <carlos.alo...@cabify.com>
Subject Re: Trying to understand why a node gets 'frozen'
Date Tue, 03 Oct 2017 21:05:03 GMT
The 'weird' thing about the mp_parser_died error is that, according to the
description of the issue 745, the replication never finishes as the item
that fails once, seems to fail forever, but in my case they fail, but then
they seem to work (possibly as the replication is retried), as I can find
the documents that generated the error (in the logs) in the target db...

Regards

On Tue, Oct 3, 2017 at 10:52 PM Carlos Alonso <carlos.alonso@cabify.com>
wrote:

> So to give some more context this node is responsible for replicating a
> database that has quite many attachments and it raises the 'famous'
> mp_parser_died,noproc error, that I think is this one:
> https://github.com/apache/couchdb/issues/745
>
> What I've identified so far from the logs is that along with the error
> described above, also this error appears:
>
> [error] 2017-10-03T19:54:32.380379Z couchdb@couchdb-node-1 <0.30012.3408>
> 520e44b7ae req_err(2515771787 <(251)%20577-1787>) badmatch : ok
>     [<<"chttpd_db:db_doc_req/3 L780">>,<<"chttpd:process_request/1
> L295">>,<<"chttpd:handle_request_int/1 L231">>,<<"mochiweb_http:headers/6
> L91">>,<<"proc_lib:init_p_do_apply/3 L240">>]
>
> Sometimes it appears just after the mp_parser_died error, sometimes the
> parser error happens without 'triggering' one of this badmatch ones.
>
> Then, after a while of this sequence, the initially described
> sel_conn_closed error starts raising for all requests and the node gets
> frozen. It is not responsive but it is still not removed from the cluster,
> holding its replications and, obviously, not replicating anything until it
> is restarted.
>
> I can also see interleaved unauthorized errors, which don't make much
> sense as I'm the only one accessing this cluster
>
> [error] 2017-10-03T19:33:47.022572Z couchdb@couchdb-node-1 <0.32501.3323>
> c683120c97 rexi_server throw:{unauthorized,<<"You are not authorized to
> access this db.">>} [{couch_db,open,2
>
> ,[{file,"src/couch_db.erl"},{line,99}]},{fabric_rpc,open_shard,2,[{file,"src/fabric_rpc.erl"},{line,261}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
>
>
> To me, it feels like the mp_parser_died error slowly breaks something that
> in the end brings the node unresponsive, as those errors happen a lot in
> that particular replication.
>
> Regards and thanks a lot for your help!
>
>
> On Tue, Oct 3, 2017 at 7:59 PM Joan Touzet <wohali@apache.org> wrote:
>
>> Is there more to the error? All this shows us is that the replicator
>> itself attempted a POST and had the connection closed on it. (Remember
>> that the replicator is basically just a custom client that sits
>> alongside CouchDB on the same machine.) There should be more to the
>> error log that shows why CouchDB hung up the phone.
>>
>> ----- Original Message -----
>> From: "Carlos Alonso" <carlos.alonso@cabify.com>
>> To: "user" <user@couchdb.apache.org>
>> Sent: Tuesday, 3 October, 2017 4:18:18 AM
>> Subject: Re: Trying to understand why a node gets 'frozen'
>>
>> Hello, this is happening every day, always on the same node. Any ideas?
>>
>> Thanks!
>>
>> On Sun, Oct 1, 2017 at 11:42 AM Carlos Alonso <carlos.alonso@cabify.com>
>> wrote:
>>
>> > Hello everyone!!
>> >
>> > I'm trying to understand an issue we're experiencing on CouchDB 2.1.0
>> > running on Ubuntu 14.04. The cluster itself is currently replicating
>> from
>> > another source cluster and we have seen that one node gets frozen from
>> time
>> > to time having to restart it to get it to respond again.
>> >
>> > Before getting unresponsive, the node throws a lot of {error,
>> > sel_conn_closed}. See an example trace below.
>> >
>> > [error] 2017-10-01T05:25:23.921126Z couchdb@couchdb-1 <0.13489.0>
>> > -------- gen_server <0.13489.0> terminated with reason:
>> > {checkpoint_commit_failure,<<"Failure on target commit:
>> > {'EXIT',{http_request_failed,\"POST\",\n                             \"
>> > http://127.0.0.1:5984/mydb/_ensure_full_commit\",\n
>> >        {error,sel_conn_closed}}}">>}
>> >   last msg: {'EXIT',<0.10626.0>,{checkpoint_commit_failure,<<"Failure
on
>> > target commit: {'EXIT',{http_request_failed,\"POST\",\n
>> >          \"http://127.0.0.1:5984/mydb/_ensure_full_commit\",\n
>> >                  {error,sel_conn_closed}}}">>}}
>> >      state: {state,<0.10626.0>,<0.13490.0>,20,{httpdb,"
>> > https://source_ip/mydb/
>> ",nil,[{"Accept","application/json"},{"Authorization","Basic
>> >
>> ..."},{"User-Agent","CouchDB-Replicator/2.1.0"}],30000,[{is_ssl,true},{socket_options,[{keepalive,true},{nodelay,false}]},{ssl_options,[{depth,3},{verify,verify_none}]}],10,250,<0.11931.0>,20,nil,undefined},{httpdb,"
>> > http://127.0.0.1:5984/mydb/
>> ",nil,[{"Accept","application/json"},{"Authorization","Basic
>> >
>> ..."},{"User-Agent","CouchDB-Replicator/2.1.0"}],30000,[{socket_options,[{keepalive,true},{nodelay,false}]}],10,250,<0.11995.0>,20,nil,undefined},[],<0.25756.4748>,nil,{<0.13490.0>,#Ref<0.0.724041731.98305>},[{docs_read,1},{missing_checked,1},{missing_found,1}],nil,nil,{batch,[<<"{\"_id\":\"df84bfda818ea150b249da89e8d79a38\",\"_rev\":\"1-ebb0119fbdcad604ad372fa6e05d06a2\",...\":{\"start\":1,\"ids\":[\"ebb0119fbdcad604ad372fa6e05d06a2\"]}}">>],605}}
>> >
>> > The particular node is 'responsible' for a replication that has quite
>> many
>> > {mp_parser_died,noproc} errors, which AFAIK is a known bug (
>> > https://github.com/apache/couchdb/issues/745), but I don't know if that
>> > may have any relationship.
>> >
>> > When that happens, just restarting the node brings it up and running
>> > properly.
>> >
>> > Any help would be really appreciated.
>> >
>> > Regards
>> > --
>> > [image: Cabify - Your private Driver] <http://www.cabify.com/>
>> >
>> > *Carlos Alonso*
>> > Data Engineer
>> > Madrid, Spain
>> >
>> > carlos.alonso@cabify.com
>> >
>> > Prueba gratis con este código
>> > #CARLOSA6319 <https://cabify.com/i/carlosa6319>
>> > [image: Facebook] <http://cbify.com/fb_ES>[image: Twitter]
>> > <http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES
>> >[image:
>> > Linkedin] <https://www.linkedin.com/in/mrcalonso>
>> >
>> --
>> [image: Cabify - Your private Driver] <http://www.cabify.com/>
>>
>> *Carlos Alonso*
>> Data Engineer
>> Madrid, Spain
>>
>> carlos.alonso@cabify.com
>>
>> Prueba gratis con este código
>> #CARLOSA6319 <https://cabify.com/i/carlosa6319>
>> [image: Facebook] <http://cbify.com/fb_ES>[image: Twitter]
>> <http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES
>> >[image:
>> Linkedin] <https://www.linkedin.com/in/mrcalonso>
>>
>> --
>> Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su
>> destinatario, pudiendo contener información confidencial sometida a
>> secreto
>> profesional. No está permitida su reproducción o distribución sin la
>> autorización expresa de Cabify. Si usted no es el destinatario final por
>> favor elimínelo e infórmenos por esta vía.
>>
>> This message and any attached file are intended exclusively for the
>> addressee, and it may be confidential. You are not allowed to copy or
>> disclose it without Cabify's prior written authorization. If you are not
>> the intended recipient please delete it from your system and notify us by
>> e-mail.
>>
> --
> [image: Cabify - Your private Driver] <http://www.cabify.com/>
>
> *Carlos Alonso*
> Data Engineer
> Madrid, Spain
>
> carlos.alonso@cabify.com
>
> Prueba gratis con este código
> #CARLOSA6319 <https://cabify.com/i/carlosa6319>
> [image: Facebook] <http://cbify.com/fb_ES>[image: Twitter]
> <http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES>[image:
> Linkedin] <https://www.linkedin.com/in/mrcalonso>
>
-- 
[image: Cabify - Your private Driver] <http://www.cabify.com/>

*Carlos Alonso*
Data Engineer
Madrid, Spain

carlos.alonso@cabify.com

Prueba gratis con este código
#CARLOSA6319 <https://cabify.com/i/carlosa6319>
[image: Facebook] <http://cbify.com/fb_ES>[image: Twitter]
<http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES>[image:
Linkedin] <https://www.linkedin.com/in/mrcalonso>

-- 
Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su 
destinatario, pudiendo contener información confidencial sometida a secreto 
profesional. No está permitida su reproducción o distribución sin la 
autorización expresa de Cabify. Si usted no es el destinatario final por 
favor elimínelo e infórmenos por esta vía. 

This message and any attached file are intended exclusively for the 
addressee, and it may be confidential. You are not allowed to copy or 
disclose it without Cabify's prior written authorization. If you are not 
the intended recipient please delete it from your system and notify us by 
e-mail.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message