incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hinrichs - DM&T" <dunde...@gmail.com>
Subject Re: Replication is Failing is this a known problem?
Date Sat, 28 Feb 2009 01:02:43 GMT
On Fri, Feb 27, 2009 at 8:57 AM, Adam Kocoloski
<adam.kocoloski@gmail.com> wrote:
> Hi Jeff, I can pick this one up, but not before Monday. We do have some
> replicating-attachment JIRA tickets open and active, but it looks like
> there's some new stuff in this report too.  Feel free to file another one.
>  Best,
>
> Adam
I'll review the current JIRA tickets to avoid a dupe if found, I'll
also work on building a reproducible test case for you.  Hope that
python script is ok with you.

Regards,

Jeff
>
> Sent from my iPhone
>
> On Feb 27, 2009, at 9:13 AM, "Jeff Hinrichs - DM&T" <jeffh@dundeemt.com>
> wrote:
>
>> Attempting to replicate a database with largish attachments (<= ~18MB
>> of attachments in a doc, less thatn 200 docs)  from one machine to
>> another fails consistently and at the same point.
>>
>> Scenario:
>> Both servers are running from HEAD and I've been tracking for some
>> time.  This problem has been around as long as I've been using couch.
>>
>> Machine A holds the original database, Machine B is the server that is
>> doing a PULL replication
>>
>> During the replication, Machine A starts showing the following
>> sporadically in the log:
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5902.3>] 'GET'
>>
>> /delasco-invoices/INV00652429?revs=true&attachments=true&latest=true&open_revs=["425644723"]
>> {1,
>>
>>                            1}
>> Headers: [{'Host',"192.168.2.52:5984"}]
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [error] [<0.5901.3>] Uncaught error in
>> HTTP request: {exit,normal}
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] Stacktrace:
>> [{mochiweb_request,send,2},
>>            {couch_httpd,send_chunk,2},
>>            {couch_httpd_db,db_doc_req,3},
>>            {couch_httpd_db,do_db_req,2},
>>            {couch_httpd,handle_request,3},
>>            {mochiweb_http,headers,5},
>>            {proc_lib,init_p,5}]
>>
>> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] HTTPd 500 error
>> response:
>> {"error":"error","reason":"normal"}
>>
>> As the replication continues, the frequency of these error "Uncaught
>> error in HTTP request: {exit,normal}"  increase.  Until the error is
>> being constantly repeated.  Then Machine B stops sending requests, no
>> mor log output, no errors, the last thing in Machine B's log file is:
>> [Fri, 27 Feb 2009 14:03:24 GMT] [info] [<0.20893.1>] retrying
>> couch_rep HTTP get request due to {error, req_timedout}: [104,116,
>>
>>  116,112,58,
>>                                                      
           47,47,49,
>>                                                      
           57,50,46,
>>                                                      
           49,54,56,
>>                                                      
           46,50,46,
>>                                                      
           53,50,58,
>>                                                      
           53,57,56,
>>
>>  52,47,100,
>>
>>  101,108,97,
>>
>>  115,99,111,
>>
>>  45,105,110,
>>                                                      
           118,111,
>>
>>  105,99,101,
>>
>>  115,47,73,
>>                                                      
           78,86,48,
>>                                                      
           48,54,53,
>>                                                      
           50,49,51,
>>
>>  56,63,114,
>>                                                      
           101,118,
>>
>>  115,61,116,
>>                                                      
           114,117,
>>
>>  101,38,97,
>>
>>  116,116,97,
>>
>>  99,104,109,
>>                                                      
           101,110,
>>
>>  116,115,61,
>>                                                      
           116,114,
>>
>>  117,101,38,
>>
>>  108,97,116,
>>                                                      
           101,115,
>>
>>  116,61,116,
>>                                                      
           114,117,
>>
>>  101,38,111,
>>                                                      
           112,101,
>>
>>  110,95,114,
>>                                                      
           101,118,
>>
>>  115,61,91,
>>                                                      
           34,
>>
>> <<"3070455362">>,
>>                                                      
           34,93]
>>
>> A request for status from the couchdb init.d script returns nothing
>> and checking the processes returns:
>> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep cou
>> 29281 pts/2    S+     0:00 grep cou
>> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep beam
>> 29305 pts/2    R+     0:00 grep beam
>>
>> In fact, couch has gone away completely on Machine B.  In fact,
>> couch's death is so quick it can't even say why.
>>
>> Attempts to incrementally replicate after the first failure die at
>> exactly the same place.
>>
>> I can replicate this same database on the same machine from one
>> database to another without issue.  I can dump and reload the database
>> with no problems.
>>
>> I have reported this earlier and no one seemed to have an answer.  Is
>> there a specific issue in JIRA that addresses this problem?  If not,
>> is what I have here enough to start one and should I?
>>
>> Regards,
>>
>> Jeff Hinrichs
>

Mime
View raw message