Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 35FB8EF5F for ; Thu, 6 Dec 2012 16:35:54 +0000 (UTC) Received: (qmail 82408 invoked by uid 500); 6 Dec 2012 16:13:04 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 81737 invoked by uid 500); 6 Dec 2012 16:12:31 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 81540 invoked by uid 99); 6 Dec 2012 16:12:24 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 16:12:24 +0000 Date: Thu, 6 Dec 2012 16:12:24 +0000 (UTC) From: "Dave Cottlehuber (JIRA)" To: dev@couchdb.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (COUCHDB-1346) CouchDB hangs during start of view indexing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COUCHDB-1346?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13= 511490#comment-13511490 ]=20 Dave Cottlehuber commented on COUCHDB-1346: ------------------------------------------- Generally, there are repeated hangs during basic things like fputs/freads. I see buffers in couchjs that appear to be missing a character such as '\nnteger', & from what I read couchjs is not returning from the blocking fputs/fgets OS call. I can't tell if the port is still open on the erlang side or not, I assume so. e.g. _ptr=090x74aa32a0 "[\"map_doc\",{\"_id\":\"99\",\"_rev\":\"1-463ef1de0= f185aeab70ce3c346a96572\",\"integer\":99,\"string\":\"99\"}]\n}]\nnteger);\= \n })\"]\n"=09char * I think this is because we are doing open_port(... binary ...) in Erlang, but opening the stream as text mode only in Windows. Thus the next line will lose a character (explaining the '\nnteger' stuff I was seeing in the debugger), and eventually we'll hit a point where we waitin on an additional character that will never be sent. I tried changing fopen to "rb" but I think we only use this for files like = main.js and not stdin: diff --git i/src/couchdb/priv/couch_js/util.c w/src/couchdb/priv/couch_js/u= til.c index 5c88402..e46f66f 100644 --- i/src/couchdb/priv/couch_js/util.c +++ w/src/couchdb/priv/couch_js/util.c @@ -33,7 +33,7 @@ slurp_file(const char* file, char** outbuf_p) if(strcmp(file, "-") =3D=3D 0) { fp =3D stdin; } else { - fp =3D fopen(file, "r"); + fp =3D fopen(file, "rb"); if(fp =3D=3D NULL) { fprintf(stderr, "Failed to read file: %s\n", file); exit(3); So next step is digging for that on erlang side instead and sending "\r\n" instead of just "\n". Maybe we can drop binary as an option for windows ports. I'm assuming this is in src/couchdb/couch_os_process.erl or src/couchdb/couch_os_daemons.erl and related to http://erldocs.com/R15B/erts/erlang.html?i=3D0&search=3Dopen_port#open_port= /2 which seems to be set here: git grep PORT_OPTIONS src/couchdb/couch_os_daemons.erl:-define(PORT_OPTIONS, [stream, {line, 1024= }, binary, exit_status, hide]). src/couchdb/couch_os_daemons.erl: Port =3D open_port({spawn, Spawnkiller= ++ " " ++ Command}, ?PORT_OPTIONS), src/couchdb/couch_os_process.erl:-define(PORT_OPTIONS, [stream, {line, 4096= }, binary, exit_status, hide]). src/couchdb/couch_os_process.erl: start_link(Command, Options, ?PORT_OPT= IONS). Stack Traces: ### COUCHJS ### Callstack for Thread 1 (Thread Id: 1372 (0x55c)): Index Function ---------------------------------------------------------------------------= ----- 1 ntdll.dll!_NtWriteFile@36=1E() 2 KernelBase.dll!_WriteFile@20=1E() 3 msvcr100.dll!__write_nolock=1E() 4 msvcr100.dll!__write=1E() 5 msvcr100.dll!__flush=1E() 6 msvcr100.dll!__fflush_nolock=1E() 7 msvcr100.dll!_fflush=1E() *8 couchjs.exe!couch_print(JSContext * cx=3D0x025874f0, unsigned int a= rgc=3D1, unsigned __int64 * argv=3D0x026401a0) 9 couchjs.exe!print(JSContext * cx=3D0x025874f0, unsigned int argc=3D= 1, unsigned __int64 * vp=3D0x02640190) 10 mozjs185-1.0.dll!6decf09d() 11 [Frames below may be incorrect and/or missing, no symbols loaded fo= r mozjs185-1.0.dll] 12 ntdll.dll!_RtlpHeapFindListLookupEntry@20=1E() 13 ntdll.dll!_RtlpFindEntry@8=1E() 14 0260b5c8() 15 ntdll.dll!_RtlpHeapFindListLookupEntry@20=1E() 16 ntdll.dll!_RtlpFindEntry@8=1E() 17 ntdll.dll!@RtlpFreeHeap@16=1E() 18 ntdll.dll!_RtlpHeapAddListEntry@24=1E() 19 ntdll.dll!@RtlpCreateSplitBlock@28=1E() 20 ntdll.dll!@RtlpAllocateHeap@24=1E() 21 ntdll.dll!_RtlAllocateHeap@12=1E() Which is in util.c: couch_print(JSContext* cx, uintN argc, jsval* argv) { char *bytes =3D NULL; FILE *stream =3D stdout; if (argc) { if (argc > 1 && argv[1] =3D=3D JSVAL_TRUE) { stream =3D stderr; } bytes =3D enc_string(cx, argv[0], NULL); if(!bytes) return; fprintf(stream, "%s", bytes); JS_free(cx, bytes); } fputc('\n', stream); // dch: we never return from this fflush(stream); } ### COUCHSPAWNKILLABLE ### couchspawnkillable looks clean to me, its just waiting for the couchjs proc= ess to terminate: // Wait for the process to terminate so we can reflect the exit code // back to couch. WaitForSingleObject(pi.hProcess, INFINITE); if (!GetExitCodeProcess(pi.hProcess, &exitcode)) return 6; return exitcode; } =20 > CouchDB hangs during start of view indexing > ------------------------------------------- > > Key: COUCHDB-1346 > URL: https://issues.apache.org/jira/browse/COUCHDB-1346 > Project: CouchDB > Issue Type: Bug > Components: View Server Support > Affects Versions: 1.3 > Environment: Windows 7 Enterprise only, not able to replicate on = Mac OS X. > Erlang R14B03 + crypto patches. > Mozilla Javascript 1.8.5 > Reporter: Dave Cottlehuber > Assignee: Adam Kocoloski > Priority: Blocker > Labels: Windows > Fix For: 1.3 > > > [info] [<0.20499.0>] Opening index for db: test_suite_db idx: f4421bf4e9c= 9bf2acb3db91bca9e9adc sig: "d5c87ad33242b181f86be2139cbccd96" > [info] [<0.20504.0>] Starting index update for db: test_suite_db idx: f44= 21bf4e9c9bf2acb3db91bca9e9adc > [info] [<0.20334.0>] 172.16.40.1 - - POST /test_suite_db/_temp_view 500 > [info] [<0.20513.0>] 172.16.40.1 - - GET /_utils/couch_tests.html?script/= couch_tests.js 200 > [info] [<0.20514.0>] 172.16.40.1 - - GET /_utils/index.html 200 > [info] [<0.20060.0>] 172.16.40.1 - - DELETE /test_suite_db_a/ 200 > [info] [<0.20407.0>] 172.16.40.1 - - GET /test_suite_reports/ 404 > [info] [<0.20058.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20071.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20069.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20484.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20364.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20062.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20388.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20345.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20072.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20059.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20061.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404 > [info] [<0.20472.0>] 172.16.40.1 - - DELETE /test_suite_db/ 200 > [error] [<0.20050.0>] ** Generic server couch_index_server terminating=20 > ** Last message in was {'$gen_cast',{reset_indexes,<<"test_suite_db">>}} > ** When Server state =3D=3D {st,"../var/lib/couchdb"} > ** Reason for termination =3D=3D=20 > ** {{case_clause,{error,eacces}}, > [{couch_file,'-nuke_dir/2-fun-0-',3}, > {lists,foreach,2}, > {couch_file,nuke_dir,2}, > {couch_index_server,handle_cast,2}, > {gen_server,handle_msg,5}, > {proc_lib,init_p_do_apply,3}]} > =3DERROR REPORT=3D=3D=3D=3D 23-Nov-2011::21:17:14 =3D=3D=3D > ** Generic server couch_index_server terminating=20 > ** Last message in was {'$gen_cast',{reset_indexes,<<"test_suite_db">>}} > ** When Server state =3D=3D {st,"../var/lib/couchdb"} > ** Reason for termination =3D=3D=20 > ** {{case_clause,{error,eacces}}, > [{couch_file,'-nuke_dir/2-fun-0-',3}, > {lists,foreach,2}, > {couch_file,nuke_dir,2}, > {couch_index_server,handle_cast,2}, > {gen_server,handle_msg,5}, > {proc_lib,init_p_do_apply,3}]} > [error] [<0.20050.0>] {error_report,<0.19957.0>, > {<0.20050.0>,crash_report, > [[{initial_call, > {couch_index_server,init,['Argument__1']= }}, > {pid,<0.20050.0>}, > {registered_name,couch_index_server}, > {error_info, > {exit, > {{case_clause,{error,eacces}}, > [{couch_file,'-nuke_dir/2-fun-0-',3= }, > {lists,foreach,2}, > {couch_file,nuke_dir,2}, > {couch_index_server,handle_cast,2}= , > {gen_server,handle_msg,5}, > {proc_lib,init_p_do_apply,3}]}, > [{gen_server,terminate,6}, > {proc_lib,init_p_do_apply,3}]}}, > {ancestors, > [couch_secondary_services,couch_server_s= up, > <0.19958.0>]}, > {messages, > [{'$gen_cast', > {reset_indexes,<<"test_suite_db_a">= >}}]}, > {links,[<0.20051.0>,<0.20026.0>]}, > {dictionary,[]}, > {trap_exit,true}, > {status,running}, > {heap_size,1597}, > {stack_size,24}, > {reductions,12211}], > [{neighbour, > [{pid,<0.20051.0>}, > {registered_name,[]}, > {initial_call, > {couch_event_sup,init,['Argument__1= ']}}, > {current_function,{gen_server,loop,6}}, > {ancestors, > [couch_index_server, > couch_secondary_services, > couch_server_sup,<0.19958.0>]}, > {messages,[]}, > {links,[<0.20050.0>,<0.20018.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,233}, > {stack_size,9}, > {reductions,32}]}]]}} > =3DCRASH REPORT=3D=3D=3D=3D 23-Nov-2011::21:17:14 =3D=3D=3D > crasher: > initial call: couch_index_server:init/1 > pid: <0.20050.0> > registered_name: couch_index_server > exception exit: {{case_clause,{error,eacces}}, > [{couch_file,'-nuke_dir/2-fun-0-',3}, > {lists,foreach,2}, > {couch_file,nuke_dir,2}, > {couch_index_server,handle_cast,2}, > {gen_server,handle_msg,5}, > {proc_lib,init_p_do_apply,3}]} > in function gen_server:terminate/6 > ancestors: [couch_secondary_services,couch_server_sup,<0.19958.0>] > messages: [{'$gen_cast',{reset_indexes,<<"test_suite_db_a">>}}] > links: [<0.20051.0>,<0.20026.0>] > dictionary: [] > trap_exit: true > status: running > heap_size: 1597 > stack_size: 24 > reductions: 12211 > neighbours: > neighbour: [{pid,<0.20051.0>}, > {registered_name,[]}, > {initial_call,{couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[couch_index_server,couch_secondary_services= , > couch_server_sup,<0.19958.0>]}, > {messages,[]}, > {links,[<0.20050.0>,<0.20018.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,233}, > {stack_size,9}, > {reductions,32}] > [error] [<0.20026.0>] {error_report,<0.19957.0>, > {<0.20026.0>,supervisor_report, > [{supervisor,{local,couch_secondary_services}}= , > {errorContext,child_terminated}, > {reason, > {{case_clause,{error,eacces}}, > [{couch_file,'-nuke_dir/2-fun-0-',3}, > {lists,foreach,2}, > {couch_file,nuke_dir,2}, > {couch_index_server,handle_cast,2}, > {gen_server,handle_msg,5}, > {proc_lib,init_p_do_apply,3}]}}, > {offender, > [{pid,<0.20050.0>}, > {name,index_server}, > {mfargs,{couch_index_server,start_link,[= ]}}, > {restart_type,permanent}, > {shutdown,brutal_kill}, > {child_type,worker}]}]}} > OS process tree at this time is: > Process information for SENDAI: > Name Pid Pri Thd Hnd VM WS Priv > Idle 0 0 2 0 0 24 0 > System 4 8 79 477 3380 304 108 > explorer 1984 8 21 664 213732 46340 21540 > cmd 2104 8 1 25 48132 3304 2144 > pslist 2776 13 1 133 63584 4976 2000 > cmd 2504 8 1 26 44980 3512 3012 > werl 2680 8 16 390 196232 40064 28628 > win32sysinfo 1152 8 1 21 12624 2124 640 > couchspawnkillable 1444 8 1 30 12992 2284 688 > couchjs 1468 8 1 39 55900 6572 4056 > couchspawnkillable 2740 8 1 30 12992 2280 684 > couchjs 2756 8 1 39 55900 7108 4444 > Erlang resumes running CouchDB when couchjs procs are terminated with ext= reme > prejudice. The hang still occurs after reverting fdmanana's COUCHDB-1334 > commit. This could be a race condition during invalidation of the views, = and > subsequent deletion of the related ddoc view directory prior to reindexin= g. > On Windows a filesystem object cannot be deleted if there are open handle= s > remaining. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira