couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Filipe Manana (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1212) Newly created user accounts cannot sign-in after _user database crashes
Date Thu, 01 Dec 2011 12:32:40 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160859#comment-13160859
] 

Filipe Manana commented on COUCHDB-1212:
----------------------------------------

Benoit,

"I'm not sure it's fine to add infinity timeout like this. we open the door to future issues
imo."

Have you observed any specific issues with the infinity timeouts here?

"It's generally not good to wait iinfintely that a task finish. I think we should rather investigate
why it take so long here."

Benoit, these timeouts happen because the system is under heavy load. An easy way to reproduce
this is to compact a really large database (10, 20Gb, especially without SSDs) and keep a
client opening a database (it can be any other database), or even querying _active_tasks (couch_task_status:all/0
has no infinity timeout as well), while the old database file is being deleted asynchronously.
I certainly observe these timeouts during the delete phase on Linux w/ ext4 and a HDD.

"And maybe intoduce another system if the queue is not enough."

What do you mean? What system, what queue?

"Like removing the system ofrefs count?"

Are you suggesting removing the database ref counters? Unless you have an alternative system
to deal with handling MVCC snapshots and compaction, that's seems a really bad idea.
                
> Newly created user accounts cannot sign-in after _user database crashes 
> ------------------------------------------------------------------------
>
>                 Key: COUCHDB-1212
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1212
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core, HTTP Interface
>    Affects Versions: 1.0.2
>         Environment: Ubuntu 10.10, Erlang R14B02 (erts-5.8.3)
>            Reporter: Jan van den Berg
>            Priority: Critical
>              Labels: _users, authentication
>         Attachments: couchdb-1212.patch
>
>
> We have one (4,5 GB) couch database and we use the (default) _users database to store
user accounts for a website. Once a week we need to restart couchdb because newly sign-up
user accounts cannot login any more. They get a HTTP statuscode 401 from the _session HTTP
interface. We update, and compact the database three times a day.
> This is the a stacktrace I see in the couch database log prior to when these issues occur.
> ----------- couch.log ---------------
> [Wed, 29 Jun 2011 22:02:46 GMT] [info] [<0.117.0>] Starting compaction for db "fbm"
> [Wed, 29 Jun 2011 22:02:46 GMT] [info] [<0.5753.79>] 127.0.0.1 - - 'POST' /fbm/_compact
202
> [Wed, 29 Jun 2011 22:02:46 GMT] [info] [<0.5770.79>] 127.0.0.1 - - 'POST' /fbm/_view_cleanup
202
> [Wed, 29 Jun 2011 22:10:19 GMT] [info] [<0.5773.79>] 86.9.246.184 - - 'GET' /_session
200
> [Wed, 29 Jun 2011 22:24:39 GMT] [info] [<0.6236.79>] 85.28.105.161 - - 'GET' /_session
200
> [Wed, 29 Jun 2011 22:25:06 GMT] [error] [<0.84.0>] ** Generic server couch_server
terminating 
> ** Last message in was {open,<<"fbm">>,
>                              [{user_ctx,{user_ctx,null,[],undefined}}]}
> ** When Server state == {server,"/opt/couchbase-server/var/lib/couchdb",
>                             {re_pattern,0,0,
>                                 <<69,82,67,80,116,0,0,0,16,0,0,0,1,0,0,0,0,0,
>                                   0,0,0,0,0,0,40,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                   0,93,0,72,25,77,0,0,0,0,0,0,0,0,0,0,0,0,254,
>                                   255,255,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                   77,0,0,0,0,16,171,255,3,0,0,0,128,254,255,
>                                   255,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,69,26,
>                                   84,0,72,0>>},
>                             100,2,"Sat, 18 Jun 2011 14:00:44 GMT"}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,[<0.116.0>,{open_ref_count,<0.10417.79>}]}}
> [Wed, 29 Jun 2011 22:25:06 GMT] [error] [<0.84.0>] {error_report,<0.31.0>,
>     {<0.84.0>,crash_report,
>      [[{initial_call,{couch_server,init,['Argument__1']}},
>        {pid,<0.84.0>},
>        {registered_name,couch_server},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.116.0>,{open_ref_count,<0.10417.79>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {ancestors,[couch_primary_services,couch_server_sup,<0.32.0>]},
>        {messages,[]},
>        {links,[<0.91.0>,<0.483.0>,<0.116.0>,<0.79.0>]},
>        {dictionary,[]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,6765},
>        {stack_size,24},
>        {reductions,206710598}],
>       []]}}
> [Wed, 29 Jun 2011 22:25:06 GMT] [error] [<0.79.0>] {error_report,<0.31.0>,
>     {<0.79.0>,supervisor_report,
>      [{supervisor,{local,couch_primary_services}},
>       {errorContext,child_terminated},
>       {reason,
>           {timeout,
>               {gen_server,call,[<0.116.0>,{open_ref_count,<0.10417.79>}]}}},
>       {offender,
>           [{pid,<0.84.0>},
>            {name,couch_server},
>            {mfargs,{couch_server,sup_start_link,[]}},
>            {restart_type,permanent},
>            {shutdown,1000},
>            {child_type,worker}]}]}}
> [Wed, 29 Jun 2011 22:25:06 GMT] [error] [<0.91.0>] ** Generic server <0.91.0>
terminating 
> ** Last message in was {'EXIT',<0.84.0>,
>                            {timeout,
>                                {gen_server,call,
>                                    [<0.116.0>,
>                                     {open_ref_count,<0.10417.79>}]}}}
> ** When Server state == {db,<0.91.0>,<0.92.0>,nil,<<"1308405644393791">>,
>                             <0.90.0>,<0.94.0>,
>                             {db_header,5,91,0,
>                                 {378285,{30,9}},
>                                 {380466,39},
>                                 nil,0,nil,nil,1000},
>                             91,
>                             {btree,<0.90.0>,
>                                 {378285,{30,9}},
>                                 #Fun<couch_db_updater.7.10053969>,
>                                 #Fun<couch_db_updater.8.35220795>,
>                                 #Fun<couch_btree.5.124754102>,
>                                 #Fun<couch_db_updater.9.107593676>},
>                             {btree,<0.90.0>,
>                                 {380466,39},
>                                 #Fun<couch_db_updater.10.30996817>,
>                                 #Fun<couch_db_updater.11.96515267>,
>                                 #Fun<couch_btree.5.124754102>,
>                                 #Fun<couch_db_updater.12.117826253>},
>                             {btree,<0.90.0>,nil,#Fun<couch_btree.0.83553141>,
>                                 #Fun<couch_btree.1.30790806>,
>                                 #Fun<couch_btree.2.124754102>,nil},
>                             91,<<"_users">>,
>                             "/opt/couchbase-server/var/lib/couchdb/_users.couch",
>                             [#Fun<couch_doc.7.50754398>],
>                             [],nil,
>                             {user_ctx,null,[],undefined},
>                             nil,1000,
>                             [before_header,after_header,on_file_open],
>                             true}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,[<0.116.0>,{open_ref_count,<0.10417.79>}]}}
> [Wed, 29 Jun 2011 22:25:06 GMT] [error] [<0.91.0>] {error_report,<0.31.0>,
>     {<0.91.0>,crash_report,
>      [[{initial_call,{couch_db,init,['Argument__1']}},
>        {pid,<0.91.0>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.116.0>,{open_ref_count,<0.10417.79>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {ancestors,[<0.89.0>]},
>        {messages,[]},
>        {links,[]},
>        {dictionary,[]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,610},
>        {stack_size,24},
>        {reductions,8797798}],
>       []]}}
> [Wed, 29 Jun 2011 22:25:06 GMT] [info] [<0.300.0>] Shutting down view group server,
monitored db is closing.
> [Wed, 29 Jun 2011 22:25:06 GMT] [error] [<0.10417.79>] Uncaught error in HTTP request:
{exit,
>                                  {{timeout,
>                                    {gen_server,call,
>                                     [<0.116.0>,
>                                      {open_ref_count,<0.10417.79>}]}},
>                                   {gen_server,call,
>                                    [couch_server,
>                                     {open,<<"fbm">>,
>                                      [{user_ctx,
>                                        {user_ctx,null,[],undefined}}]},
>                                     infinity]}}}
> [Wed, 29 Jun 2011 22:25:06 GMT] [error] [<0.483.0>] ** Generic server <0.483.0>
terminating 
> ** Last message in was {'EXIT',<0.84.0>,
>                            {timeout,
>                                {gen_server,call,
>                                    [<0.116.0>,
>                                     {open_ref_count,<0.10417.79>}]}}}
> ** When Server state == {db,<0.483.0>,<0.484.0>,nil,<<"1308405937993370">>,
>                             <0.4643.19>,<0.4645.19>,
>                             {db_header,5,890453,0,
>                                 {3279126950,{752003,0}},
>                                 {3279118313,752003},
>                                 {3279132318,[]},
>                                 0,nil,3279127184,1000},
>                             890453,
>                             {btree,<0.4643.19>,
>                                 {3279126950,{752003,0}},
>                                 #Fun<couch_db_updater.7.10053969>,
>                                 #Fun<couch_db_updater.8.35220795>,
>                                 #Fun<couch_btree.5.124754102>,
>                                 #Fun<couch_db_updater.9.107593676>},
>                             {btree,<0.4643.19>,
>                                 {3279118313,752003},
>                                 #Fun<couch_db_updater.10.30996817>,
>                                 #Fun<couch_db_updater.11.96515267>,
>                                 #Fun<couch_btree.5.124754102>,
>                                 #Fun<couch_db_updater.12.117826253>},
>                             {btree,<0.4643.19>,
>                                 {3279132318,[]},
>                                 #Fun<couch_btree.0.83553141>,
>                                 #Fun<couch_btree.1.30790806>,
>                                 #Fun<couch_btree.2.124754102>,nil},
>                             890453,<<"fbm_full">>,
>                             "/opt/couchbase-server/var/lib/couchdb/fbm_full.couch",
>                             [#Fun<couch_doc.7.50754398>],
>                             [{<<"admins">>,
>                               {[{<<"names">>,[]},
>                                 {<<"roles">>,[<<"import">>]}]}},
>                              {<<"readers">>,
>                               {[{<<"names">>,[]},{<<"roles">>,[]}]}}],
>                             3279127184,
>                             {user_ctx,null,[],undefined},
>                             nil,1000,
>                             [before_header,after_header,on_file_open],
>                             false}
> ** Reason for termination == 
> ** {timeout,{gen_server,call,[<0.116.0>,{open_ref_count,<0.10417.79>}]}}
> [Wed, 29 Jun 2011 22:25:06 GMT] [error] [<0.483.0>] {error_report,<0.31.0>,
>     {<0.483.0>,crash_report,
>      [[{initial_call,{couch_db,init,['Argument__1']}},
>        {pid,<0.483.0>},
>        {registered_name,[]},
>        {error_info,
>            {exit,
>                {timeout,
>                    {gen_server,call,
>                        [<0.116.0>,{open_ref_count,<0.10417.79>}]}},
>                [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
>        {ancestors,[<0.480.0>]},
>        {messages,[]},
>        {links,[]},
>        {dictionary,[]},
>        {trap_exit,true},
>        {status,running},
>        {heap_size,6765},
>        {stack_size,24},
>        {reductions,1389}],
>       []]}}
> [Wed, 29 Jun 2011 22:25:06 GMT] [info] [<0.2984.19>] Shutting down view group server,
monitored db is closing.
> [Wed, 29 Jun 2011 22:25:06 GMT] [info] [<0.10417.79>] Stacktrace: [{gen_server,call,3},
>              {couch_server,open,2},
>              {couch_db,open,2},
>              {couch_httpd_db,do_db_req,2},
>              {couch_httpd,handle_request_int,5},
>              {mochiweb_http,headers,5},
>              {proc_lib,init_p_do_apply,3}]
> ------- end --------
> Here's the log file of me signing in as an admin, creating a new user, and trying to
sign-in as the newly created user. 
> ------ couch.log -------
> [Fri, 01 Jul 2011 18:37:16 GMT] [info] [<0.20439.91>] 93.92.103.118 - - 'POST'
/_session 200
> [Fri, 01 Jul 2011 18:37:16 GMT] [info] [<0.20457.91>] checkpointing view update
at seq 91 for _users _design/_auth
> [Fri, 01 Jul 2011 18:37:16 GMT] [info] [<0.20439.91>] 93.92.103.118 - - 'GET' /_users/_design/_auth/_list/secure/users
200
> [Fri, 01 Jul 2011 18:38:35 GMT] [info] [<0.20456.91>] 93.92.103.118 - - 'PUT' /_users/org.couchdb.user:example@mail.com
201
> [Fri, 01 Jul 2011 18:38:35 GMT] [info] [<0.20457.91>] checkpointing view update
at seq 92 for _users _design/_auth
> [Fri, 01 Jul 2011 18:38:35 GMT] [info] [<0.20456.91>] 93.92.103.118 - - 'GET' /_users/_design/_auth/_list/secure/users
200
> [Fri, 01 Jul 2011 18:38:47 GMT] [info] [<0.20456.91>] 93.92.103.118 - - 'GET' /_users/_design/_auth/_list/secure/users?key=%22org.couchdb.user:example@mail.com%22
200
> [Fri, 01 Jul 2011 18:38:47 GMT] [info] [<0.20456.91>] 93.92.103.118 - - 'PUT' /_users/org.couchdb.user:example@mail.com
201
> [Fri, 01 Jul 2011 18:39:00 GMT] [info] [<0.20547.91>] 93.92.103.118 - - 'GET' /_session
200
> [Fri, 01 Jul 2011 18:39:01 GMT] [info] [<0.20547.91>] 93.92.103.118 - - 'GET' /fbm/_design/api/_list/secure/competitions
200
> [Fri, 01 Jul 2011 18:39:12 GMT] [info] [<0.20547.91>] 93.92.103.118 - - 'POST'
/_session 401
> [Fri, 01 Jul 2011 18:39:22 GMT] [info] [<0.20547.91>] 93.92.103.118 - - 'POST'
/_session 401
> ------- end ---------
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message