couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: [jira] [Commented] (COUCHDB-1346) CouchDB hangs during start of view indexing
Date Tue, 11 Dec 2012 08:21:21 GMT
+1 for A, -1 for B unless/until someone figures out what the root cause is.
I've looked at it and I can't say that I see what's causing the issue. My
intuition says that we're hitting some sort of race condition with the
message dropping but I'd have to see a much clearer interaction with some
extra logging to say for certain.

While the general idea is sane and I do understand that there are platform
differences I can't see what differences may be leading to the
current-windows-specific issues. There are a number of issues that we may
not be exposing due to platform differences yet aren't platform specific
(ie, perhaps we need to exceed buffer sizes and we only currently do that
on Windows or some such).

To just say "we've not seen it yet" seems awfully foolhardy given the
number of times we've seen bugs crop up only after widespread usage of a
release.


On Tue, Dec 11, 2012 at 2:01 AM, Dave Cottlehuber (JIRA) <jira@apache.org>wrote:

>
>     [
> https://issues.apache.org/jira/browse/COUCHDB-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528781#comment-13528781]
>
> Dave Cottlehuber commented on COUCHDB-1346:
> -------------------------------------------
>
> After a long night of bisectus interruptus, I get the same result:
>
> erl@werl /relax/couchdb
> $ git bisect good
> a851c6e5150d14221ca018587d76214856c1555a is the first bad commit
> commit a851c6e5150d14221ca018587d76214856c1555a
> Author: Filipe David Borba Manana <fdmanana@apache.org>
> Date:   Sun Nov 6 14:25:04 2011 +0000
>
>     More efficient communication with the view server
>
>     This change makes the communication between the Erlang VM and
>     an external view server (couchjs for e.g.) more efficient by
>     writing a series of commands into the port and reading all the
>     responses from the external view server after doing all those
>     writes. This minimizes the amount of time each endpoint spends
>     blocked reading from the port.
>
>     COUCHDB-1334
>
> & doing a clean build from 1.3.x just reverting this patch works on both
> my build boxes (this time :/).
>
> +0 on A in general given that many of us have been using 1.3.x actively
> for months on other platforms without issue.
>
> +1 for B.
>
>
> > CouchDB hangs during start of view indexing
> > -------------------------------------------
> >
> >                 Key: COUCHDB-1346
> >                 URL: https://issues.apache.org/jira/browse/COUCHDB-1346
> >             Project: CouchDB
> >          Issue Type: Bug
> >          Components: View Server Support
> >    Affects Versions: 1.3
> >         Environment: Windows 7 Enterprise only, not able to replicate on
> Mac OS X.
> > Erlang R14B03 + crypto patches.
> > Mozilla Javascript 1.8.5
> >            Reporter: Dave Cottlehuber
> >            Assignee: Adam Kocoloski
> >            Priority: Blocker
> >              Labels: Windows
> >             Fix For: 1.3
> >
> >
> > [info] [<0.20499.0>] Opening index for db: test_suite_db idx:
> f4421bf4e9c9bf2acb3db91bca9e9adc sig: "d5c87ad33242b181f86be2139cbccd96"
> > [info] [<0.20504.0>] Starting index update for db: test_suite_db idx:
> f4421bf4e9c9bf2acb3db91bca9e9adc
> > [info] [<0.20334.0>] 172.16.40.1 - - POST /test_suite_db/_temp_view 500
> > [info] [<0.20513.0>] 172.16.40.1 - - GET
> /_utils/couch_tests.html?script/couch_tests.js 200
> > [info] [<0.20514.0>] 172.16.40.1 - - GET /_utils/index.html 200
> > [info] [<0.20060.0>] 172.16.40.1 - - DELETE /test_suite_db_a/ 200
> > [info] [<0.20407.0>] 172.16.40.1 - - GET /test_suite_reports/ 404
> > [info] [<0.20058.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20071.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20069.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20484.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20364.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20062.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20388.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20345.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20072.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20059.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20061.0>] 172.16.40.1 - - DELETE /test_suite_db/ 404
> > [info] [<0.20472.0>] 172.16.40.1 - - DELETE /test_suite_db/ 200
> > [error] [<0.20050.0>] ** Generic server couch_index_server terminating
> > ** Last message in was {'$gen_cast',{reset_indexes,<<"test_suite_db">>}}
> > ** When Server state == {st,"../var/lib/couchdb"}
> > ** Reason for termination ==
> > ** {{case_clause,{error,eacces}},
> >     [{couch_file,'-nuke_dir/2-fun-0-',3},
> >      {lists,foreach,2},
> >      {couch_file,nuke_dir,2},
> >      {couch_index_server,handle_cast,2},
> >      {gen_server,handle_msg,5},
> >      {proc_lib,init_p_do_apply,3}]}
> > =ERROR REPORT==== 23-Nov-2011::21:17:14 ===
> > ** Generic server couch_index_server terminating
> > ** Last message in was {'$gen_cast',{reset_indexes,<<"test_suite_db">>}}
> > ** When Server state == {st,"../var/lib/couchdb"}
> > ** Reason for termination ==
> > ** {{case_clause,{error,eacces}},
> >     [{couch_file,'-nuke_dir/2-fun-0-',3},
> >      {lists,foreach,2},
> >      {couch_file,nuke_dir,2},
> >      {couch_index_server,handle_cast,2},
> >      {gen_server,handle_msg,5},
> >      {proc_lib,init_p_do_apply,3}]}
> > [error] [<0.20050.0>] {error_report,<0.19957.0>,
> >                           {<0.20050.0>,crash_report,
> >                            [[{initial_call,
> >
>  {couch_index_server,init,['Argument__1']}},
> >                              {pid,<0.20050.0>},
> >                              {registered_name,couch_index_server},
> >                              {error_info,
> >                                  {exit,
> >                                      {{case_clause,{error,eacces}},
> >
> [{couch_file,'-nuke_dir/2-fun-0-',3},
> >                                        {lists,foreach,2},
> >                                        {couch_file,nuke_dir,2},
> >
>  {couch_index_server,handle_cast,2},
> >                                        {gen_server,handle_msg,5},
> >                                        {proc_lib,init_p_do_apply,3}]},
> >                                      [{gen_server,terminate,6},
> >                                       {proc_lib,init_p_do_apply,3}]}},
> >                              {ancestors,
> >
>  [couch_secondary_services,couch_server_sup,
> >                                   <0.19958.0>]},
> >                              {messages,
> >                                  [{'$gen_cast',
> >
> {reset_indexes,<<"test_suite_db_a">>}}]},
> >                              {links,[<0.20051.0>,<0.20026.0>]},
> >                              {dictionary,[]},
> >                              {trap_exit,true},
> >                              {status,running},
> >                              {heap_size,1597},
> >                              {stack_size,24},
> >                              {reductions,12211}],
> >                             [{neighbour,
> >                                  [{pid,<0.20051.0>},
> >                                   {registered_name,[]},
> >                                   {initial_call,
> >
> {couch_event_sup,init,['Argument__1']}},
> >                                   {current_function,{gen_server,loop,6}},
> >                                   {ancestors,
> >                                       [couch_index_server,
> >                                        couch_secondary_services,
> >                                        couch_server_sup,<0.19958.0>]},
> >                                   {messages,[]},
> >                                   {links,[<0.20050.0>,<0.20018.0>]},
> >                                   {dictionary,[]},
> >                                   {trap_exit,false},
> >                                   {status,waiting},
> >                                   {heap_size,233},
> >                                   {stack_size,9},
> >                                   {reductions,32}]}]]}}
> > =CRASH REPORT==== 23-Nov-2011::21:17:14 ===
> >   crasher:
> >     initial call: couch_index_server:init/1
> >     pid: <0.20050.0>
> >     registered_name: couch_index_server
> >     exception exit: {{case_clause,{error,eacces}},
> >                      [{couch_file,'-nuke_dir/2-fun-0-',3},
> >                       {lists,foreach,2},
> >                       {couch_file,nuke_dir,2},
> >                       {couch_index_server,handle_cast,2},
> >                       {gen_server,handle_msg,5},
> >                       {proc_lib,init_p_do_apply,3}]}
> >       in function  gen_server:terminate/6
> >     ancestors: [couch_secondary_services,couch_server_sup,<0.19958.0>]
> >     messages: [{'$gen_cast',{reset_indexes,<<"test_suite_db_a">>}}]
> >     links: [<0.20051.0>,<0.20026.0>]
> >     dictionary: []
> >     trap_exit: true
> >     status: running
> >     heap_size: 1597
> >     stack_size: 24
> >     reductions: 12211
> >   neighbours:
> >     neighbour: [{pid,<0.20051.0>},
> >                   {registered_name,[]},
> >                   {initial_call,{couch_event_sup,init,['Argument__1']}},
> >                   {current_function,{gen_server,loop,6}},
> >
> {ancestors,[couch_index_server,couch_secondary_services,
> >                               couch_server_sup,<0.19958.0>]},
> >                   {messages,[]},
> >                   {links,[<0.20050.0>,<0.20018.0>]},
> >                   {dictionary,[]},
> >                   {trap_exit,false},
> >                   {status,waiting},
> >                   {heap_size,233},
> >                   {stack_size,9},
> >                   {reductions,32}]
> > [error] [<0.20026.0>] {error_report,<0.19957.0>,
> >                           {<0.20026.0>,supervisor_report,
> >
>  [{supervisor,{local,couch_secondary_services}},
> >                             {errorContext,child_terminated},
> >                             {reason,
> >                                 {{case_clause,{error,eacces}},
> >                                  [{couch_file,'-nuke_dir/2-fun-0-',3},
> >                                   {lists,foreach,2},
> >                                   {couch_file,nuke_dir,2},
> >                                   {couch_index_server,handle_cast,2},
> >                                   {gen_server,handle_msg,5},
> >                                   {proc_lib,init_p_do_apply,3}]}},
> >                             {offender,
> >                                 [{pid,<0.20050.0>},
> >                                  {name,index_server},
> >
>  {mfargs,{couch_index_server,start_link,[]}},
> >                                  {restart_type,permanent},
> >                                  {shutdown,brutal_kill},
> >                                  {child_type,worker}]}]}}
> > OS process tree at this time is:
> > Process information for SENDAI:
> > Name                             Pid Pri Thd  Hnd      VM      WS    Priv
> > Idle                               0   0   2    0       0      24       0
> >   System                           4   8  79  477    3380     304     108
> > explorer                        1984   8  21  664  213732   46340   21540
> >   cmd                           2104   8   1   25   48132    3304    2144
> >     pslist                      2776  13   1  133   63584    4976    2000
> >   cmd                           2504   8   1   26   44980    3512    3012
> >     werl                        2680   8  16  390  196232   40064   28628
> >       win32sysinfo              1152   8   1   21   12624    2124     640
> >       couchspawnkillable        1444   8   1   30   12992    2284     688
> >         couchjs                 1468   8   1   39   55900    6572    4056
> >       couchspawnkillable        2740   8   1   30   12992    2280     684
> >         couchjs                 2756   8   1   39   55900    7108    4444
> > Erlang resumes running CouchDB when couchjs procs are terminated with
> extreme
> > prejudice. The hang still occurs after reverting fdmanana's COUCHDB-1334
> > commit. This could be a race condition during invalidation of the views,
> and
> > subsequent deletion of the related ddoc view directory prior to
> reindexing.
> > On Windows a filesystem object cannot be deleted if there are open
> handles
> > remaining.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message