Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 91351 invoked from network); 30 Jan 2009 05:28:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Jan 2009 05:28:19 -0000 Received: (qmail 42817 invoked by uid 500); 30 Jan 2009 05:28:18 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 42789 invoked by uid 500); 30 Jan 2009 05:28:18 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 42778 invoked by uid 99); 30 Jan 2009 05:28:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jan 2009 21:28:18 -0800 X-ASF-Spam-Status: No, hits=2.7 required=10.0 tests=FS_REPLICA,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dundeemt@gmail.com designates 209.85.198.230 as permitted sender) Received: from [209.85.198.230] (HELO rv-out-0506.google.com) (209.85.198.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jan 2009 05:28:11 +0000 Received: by rv-out-0506.google.com with SMTP id g37so275447rvb.35 for ; Thu, 29 Jan 2009 21:27:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=7/nfwXPLAs/QfA9970VzEON2IfPbNUDWFgG1UV9wLQ4=; b=e8Kx5gZbFepMAz3rPx4AUiZy0KPpmr/W4kC0WmxHxd+3rtM3Th2DmOnxsdleszye8W aaFuuuRRr4Too0gKXptgV/7SXClT3KHaoARSJBK1fkyp+gzJbBzmIpLSgl26zPKhqWV3 XhyOYp54xRfwHwq2XLEsDdfVY/8IfEgKMx2IM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; b=HTxaLDS+wiaIrDdpjjWWhqoh2Wx+DubwQ17R4F97EK87bTvxSfowCO4RWKYd7FZFJq 03vAeYP41bgVxQu3/IBVOlq0ocVW/Ggt1yQ5NDwsxQHB6kmIDcFnu/dxXYsuJGqbsuOI hW0O8iBuVsCUfEKs7LdR+lAl9x/AzY2viAgRQ= MIME-Version: 1.0 Received: by 10.141.37.8 with SMTP id p8mr425815rvj.84.1233293270949; Thu, 29 Jan 2009 21:27:50 -0800 (PST) Reply-To: tech@dundeemt.com In-Reply-To: <2EB48D00-C388-4F03-9914-76612264326D@gmail.com> References: <5aaed53f0901280717q2ffa6dcfu2d84efe6ac1e2edb@mail.gmail.com> <3FC3B3E6-9AA5-441A-B54F-F1B47B9A4C91@gmail.com> <5aaed53f0901281607x38e55c6cj4962d513abc6b3bd@mail.gmail.com> <2EB48D00-C388-4F03-9914-76612264326D@gmail.com> Date: Thu, 29 Jan 2009 23:27:50 -0600 Message-ID: <5aaed53f0901292127s8c9385bme7d2dda9422c2602@mail.gmail.com> Subject: Re: replication error From: "Jeff Hinrichs - DM&T" To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Jan 29, 2009 at 9:12 AM, Adam Kocoloski wrote: > Hi Jeff, thanks for the extra info. Something funny is going on here. > These logs don't agree with your description of how you set up the > replication. In particular, in the .52 log it looks like you sent a > replication request to .52 telling it to pull from itself. Those debug > lines that start with Url: are HTTP requests that the replicator is about to > make. > > On .194 the first line in the logfile looks like a response to an HTTP > request from a remote replicator trying to pull from .194. But then in the > headers you see a {'Host',"192.168.2.52"} tuple. > > Could you have mixed up which log was which in this email? It would make a > lot more sense. Let's confirm that first. Best, Adam > > P.S. > > The logging in mochiweb_request would look like > > case gen_tcp:send(Socket, Data) of > ok -> > ok; > - _ -> > + {error, Reason} -> > + io:format("mochiweb_request:send failed with reason ~p", > [Reason]), > exit(normal) > end. > I am initiating the replication via futon on machine .194, remote: http://192.168.2.52:5984/delasco-invoices -> local test5-mars This is what I am seeing from the log file on .194 [Fri, 30 Jan 2009 05:11:47 GMT] [error] [emulator] Error in process <0.98.0> with exit value: {function_clause,[{lists,map,[#Fun,ok]},{couch_rep,open_doc_revs,4},{couch_rep,'-enum_docs_parallel/3-fun-1-',3},{couch_rep,'-spawn_worker/3-fun-0-',3}]} [Fri, 30 Jan 2009 05:11:47 GMT] [debug] [<0.107.0>] couch_rep HTTP get request: "http://192.168.2.52:5984/delasco-invoices/INV00541353?revs=true&attachments=true&latest=true&open_revs=[\"2461225383\"]" [Fri, 30 Jan 2009 05:11:47 GMT] [info] [<0.117.0>] retrying couch_rep HTTP get request due to {error, connection_closed}: "http://192.168.2.52:5984/delasco-invoices/INV00541343?revs=true&attachments=true&latest=true&open_revs=[\"2308904194\"]" [Fri, 30 Jan 2009 05:11:47 GMT] [info] [<0.145.0>] retrying couch_rep HTTP get request due to {error, connection_closed}: "http://192.168.2.52:5984/delasco-invoices/INV00541315?revs=true&attachments=true&latest=true&open_revs=[\"3170383356\"]" [Fri, 30 Jan 2009 05:11:47 GMT] [error] [<0.145.0>] couch_rep HTTP get request failed after 10 retries: "http://192.168.2.52:5984/delasco-invoices/INV00541315?revs=true&attachments=true&latest=true&open_revs=[\"3170383356\"]" [Fri, 30 Jan 2009 05:11:48 GMT] [error] [emulator] Error in process <0.145.0> with exit value: {function_clause,[{lists,map,[#Fun,ok]},{couch_rep,open_doc_revs,4},{couch_rep,'-enum_docs_parallel/3-fun-1-',3},{couch_rep,'-spawn_worker/3-fun-0-',3}]} [Fri, 30 Jan 2009 05:11:48 GMT] [debug] [<0.117.0>] couch_rep HTTP get request: "http://192.168.2.52:5984/delasco-invoices/INV00541343?revs=true&attachments=true&latest=true&open_revs=[\"2308904194\"]" [Fri, 30 Jan 2009 05:11:48 GMT] [info] [<0.138.0>] retrying couch_rep HTTP get request due to {error, connection_closed}: "http://192.168.2.52:5984/delasco-invoices/INV00541322?revs=true&attachments=true&latest=true&open_revs=[\"3949544425\"]" [Fri, 30 Jan 2009 05:11:48 GMT] [debug] [<0.138.0>] couch_rep HTTP get request: "http://192.168.2.52:5984/delasco-invoices/INV00541322?revs=true&attachments=true&latest=true&open_revs=[\"3949544425\"]" [Fri, 30 Jan 2009 05:11:50 GMT] [info] [<0.119.0>] retrying couch_rep HTTP get request due to {error, connection_closed}: "http://192.168.2.52:5984/delasco-invoices/INV00541341?revs=true&attachments=true&latest=true&open_revs=[\"2313067153\"]" [Fri, 30 Jan 2009 05:11:50 GMT] [error] [<0.119.0>] couch_rep HTTP get request failed after 10 retries: "http://192.168.2.52:5984/delasco-invoices/INV00541341?revs=true&attachments=true&latest=true&open_revs=[\"2313067153\"]" [Fri, 30 Jan 2009 05:11:50 GMT] [error] [emulator] Error in process <0.119.0> with exit value: {function_clause,[{lists,map,[#Fun,ok]},{couch_rep,open_doc_revs,4},{couch_rep,'-enum_docs_parallel/3-fun-1-',3},{couch_rep,'-spawn_worker/3-fun-0-',3}]} ------- sometimes the couchdb process on .194 just goes away. Also, if I attempt to replicate with the same data, only no attachments or with smaller attachments (70k pdfs) it will run just fine. I have svn up and am now running at 0.9.0a739170-incubating -- on both machines. I had it go 100 documents, then blowup -- then I restarted couch on .192 and retried and it finished out the following 88 docs (188 total in the db on .52) I'm svn upping again and will try some more. > > On Jan 28, 2009, at 7:07 PM, Jeff Hinrichs - DM&T wrote: > >> On Wed, Jan 28, 2009 at 10:03 AM, Adam Kocoloski >> wrote: >>> >>> Hi Jeff, I think I'll need a reproducible test case or a little more >>> information to help debug this. mochiweb_request:send exits on any error >>> returned by the underlying gen_tcp:send, and unfortunately it doesn't >>> bother >>> to log the reason for the error. You might try adding a debug statement >>> to >>> line 125 of mochiweb_request.erl to figure out the reason why .52 failed >>> to >>> serve this document GET request. >> >> Not an erlanger but can vi, can you tell me what to put there? Currently >> it is: >> exit(normal) >> >>> >>> When you say that "the process has died" on .194, you mean the >>> replication >>> process, right? Surely that error didn't bring down the entire database? >>> Best, >>> >>> Adam >> >> Sorry, but I mean the entire couchdb process, not just the replication >> process >> I initiate the request from the remote machine (.194) it fails out >> after a while with, the error, then >> >> jlh@mars:~$ ps ax|grep couch >> 28145 pts/2 S+ 0:00 tail -f /usr/local/var/log/couchdb/couch.log >> 28375 pts/3 S+ 0:00 grep couch >> jlh@mars:~$ sudo /usr/local/etc/init.d/couchdb status >> >> jlh@mars:~$ >> >> All couch related processes are now -- gone. on .194 >> >> ----------------- >> .194 log shows: >> [Wed, 28 Jan 2009 23:50:46 GMT] [debug] [<0.1742.0>] 'GET' >> >> /invoices1/INV00541323?revs=true&attachments=true&latest=true&open_revs=["2017454730"] >> {1, >> >> 1} >> Headers: [{'Connection',"keep-alive"},{'Host',"192.168.2.52"},{"Te",[]}] >> >> [Wed, 28 Jan 2009 23:50:46 GMT] [error] [<0.1742.0>] Uncaught error in >> HTTP request: {exit,normal} >> >> [Wed, 28 Jan 2009 23:50:46 GMT] [debug] [<0.1742.0>] Stacktrace: >> [{mochiweb_request,send,2}, >> {couch_httpd,send_chunk,2}, >> {couch_httpd_db,'-db_doc_req/3-fun-1-',4}, >> {lists,foldl,3}, >> {couch_httpd_db,db_doc_req,3}, >> {couch_httpd_db,do_db_req,2}, >> {couch_httpd,handle_request,3}, >> {mochiweb_http,headers,4}] >> >> [Wed, 28 Jan 2009 23:50:46 GMT] [debug] [<0.1742.0>] HTTPd 500 error >> response: >> {"error":"error","reason":"normal"} >> -------------- >> .52 log shows: >> [Wed, 28 Jan 2009 23:49:35 GMT] [debug] [<0.452.0>] Url: >> >> "http://192.168.2.52:5984/invoices1/INV00541300?revs=true&attachments=true&latest=true&open_revs=[\"1219578511\"]" >> >> [Wed, 28 Jan 2009 23:49:35 GMT] [debug] [<0.453.0>] Url: >> >> "http://192.168.2.52:5984/invoices1/INV00653664?revs=true&attachments=true&latest=true&open_revs=[\"2059085364\"]" >> >> [Wed, 28 Jan 2009 23:49:35 GMT] [debug] [<0.454.0>] Url: >> >> "http://192.168.2.52:5984/invoices1/INV00652895?revs=true&attachments=true&latest=true&open_revs=[\"2562102070\"]" >> >> [Wed, 28 Jan 2009 23:49:35 GMT] [debug] [<0.455.0>] Url: >> >> "http://192.168.2.52:5984/invoices1/INV00652894?revs=true&attachments=true&latest=true&open_revs=[\"268796200\"]" >> >> [Wed, 28 Jan 2009 23:49:35 GMT] [info] [<0.352.0>] 192.168.2.52 - - >> 'POST' /_replicate 500 >> ----------------- >> couchdb on the remote machine (.52) is just humming along fine. >> >> Let me know what you need and I'll do my best. Sorry for the long >> pause between the first report and now. I was dashing out of the >> house to work and wanted to get the initial report out. >> >> Regards, >> >> Jeff >> >>> >>> On Jan 28, 2009, at 10:17 AM, Jeff Hinrichs - DM&T wrote: >>> >>>> replicating from 192.168.2.52 [0.9.0a738346-incubating] -> >>>> 192.168.2.194 [0.9.0a738497-incubating] >>>> >>>> -192.168.2.52:- >>>> Eshell V5.6.4 (abort with ^G) >>>> 1> init:script_id(). >>>> {"OTP APN 181 01","R12B"} >>>> >>>> -192.168.2.194- >>>> Eshell V5.6.3 (abort with ^G) >>>> 1> init:script_id(). >>>> {"OTP APN 181 01","R12B"} >>>> >>>> replication initiated in futon on .194 pulling from .52 >>>> >>>> >>>> During the process, I see this in the log... >>>> >>>> [Wed, 28 Jan 2009 14:29:47 GMT] [debug] [<0.62.0>] 'GET' >>>> >>>> >>>> /invoices/INV00651983?revs=true&attachments=true&latest=true&open_revs=["3597612357"] >>>> {1, >>>> >>>> 1} >>>> Headers: [{'Connection',"keep-alive"},{'Host',"192.168.2.52"},{"Te",[]}] >>>> >>>> [Wed, 28 Jan 2009 14:29:53 GMT] [error] [<0.62.0>] Uncaught error in >>>> HTTP request: {exit,normal} >>>> >>>> [Wed, 28 Jan 2009 14:29:53 GMT] [debug] [<0.62.0>] Stacktrace: >>>> [{mochiweb_request,send,2}, >>>> {couch_httpd,send_chunk,2}, >>>> {couch_httpd_db,'-db_doc_req/3-fun-1-',4}, >>>> {lists,foldl,3}, >>>> {couch_httpd_db,db_doc_req,3}, >>>> {couch_httpd_db,do_db_req,2}, >>>> {couch_httpd,handle_request,3}, >>>> {mochiweb_http,headers,4}] >>>> >>>> [Wed, 28 Jan 2009 14:29:53 GMT] [debug] [<0.62.0>] HTTPd 500 error >>>> response: >>>> {"error":"error","reason":"normal"} >>>> >>>> Checking the status of couch on .194 at this point shows that the >>>> process has died >>>> >>>> repeated attempts fail, on different documents >>>> >>>> >>>> regards, >>>> >>>> Jeff >>> >>> >