couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipe David Manana <fdman...@apache.org>
Subject Re: Issues blocking 1.2.0 release
Date Mon, 23 Jan 2012 01:34:54 GMT
Noah, was able to reproduce your issue by tweaking the test to create
more leaf revisions for a document:

diff --git a/test/etap/242-replication-many-leaves.t
b/test/etap/242-replication-many-leaves.t
index d8d3eb9..4eb4765 100755
--- a/test/etap/242-replication-many-leaves.t
+++ b/test/etap/242-replication-many-leaves.t
@@ -56,7 +56,7 @@ doc_ids() ->

 doc_num_conflicts(<<"doc1">>) -> 10;
 doc_num_conflicts(<<"doc2">>) -> 100;
-doc_num_conflicts(<<"doc3">>) -> 286.
+doc_num_conflicts(<<"doc3">>) -> 500.


 main(_) ->


With that change, I get exactly the same timeout as you get when the
test runs the push replication. It turns out that some _bulk_docs
requests are taking more than 30 seconds (default replication
connection timeout) therefore the replication request retry messages.
Verified this by timing the _bulk_docs handler to log the time it
takes:

diff --git a/src/couchdb/couch_httpd_db.erl b/src/couchdb/couch_httpd_db.erl
index d7ecb4a..442571d 100644
--- a/src/couchdb/couch_httpd_db.erl
+++ b/src/couchdb/couch_httpd_db.erl
@@ -297,6 +297,7 @@
db_req(#httpd{path_parts=[_,<<"_ensure_full_commit">>]}=Req, _Db) ->
     send_method_not_allowed(Req, "POST");

 db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db) ->
+    T0 = now(),
     couch_stats_collector:increment({httpd, bulk_requests}),
     couch_httpd:validate_ctype(Req, "application/json"),
     {JsonProps} = couch_httpd:json_body_obj(Req),
@@ -357,7 +358,9 @@
db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db)
->
             {ok, Errors} = couch_db:update_docs(Db, Docs, Options,
replicated_changes),
             ErrorsJson =
                 lists:map(fun update_doc_result_to_json/1, Errors),
-            send_json(Req, 201, ErrorsJson)
+            Rr = send_json(Req, 201, ErrorsJson),
+            ?LOG_ERROR("BULK DOCS took ~p ms~n",
[timer:now_diff(now(), T0) / 1000]),
+            Rr
         end
     end;
 db_req(#httpd{path_parts=[_,<<"_bulk_docs">>]}=Req, _Db) ->


I was getting _bulk_docs response times after 50 seconds.

This convinces me there's nothing wrong with the codebase, the
timeouts just needs to be increased:

diff --git a/test/etap/242-replication-many-leaves.t
b/test/etap/242-replication-many-leaves.t
index d8d3eb9..6508112 100755
--- a/test/etap/242-replication-many-leaves.t
+++ b/test/etap/242-replication-many-leaves.t
@@ -77,6 +77,7 @@ test() ->
     couch_server_sup:start_link(test_util:config_files()),
     ibrowse:start(),
     crypto:start(),
+    couch_config:set("replicator", "connection_timeout", "90000", false),

     Pairs = [
         {source_db_name(), target_db_name()},
@@ -287,6 +288,6 @@ replicate(Source, Target) ->
     receive
     {'DOWN', MonRef, process, Pid, Reason} ->
         etap:is(Reason, normal, "Replication finished successfully")
-    after 300000 ->
+    after 900000 ->
         etap:bail("Timeout waiting for replication to finish")
     end.

Alternatively the test can be updated to create less revisions for the
document doc3. The current revisions # is 286 but for the tests'
purpose 205+ is enough, which should make it faster - 7000 (max url
length) / length(DocRevision) = 205

If it's ok for you, updating the timeouts plus reducing the # from 286
to 210 is fine for me.



On Mon, Jan 23, 2012 at 12:00 AM, Noah Slater <nslater@tumbolia.org> wrote:
> I'm just the dumb QA guy.
>
> If you have some diagnostics you want me to run on my machine, I am happy
> to.
>
> On Sun, Jan 22, 2012 at 11:31 PM, Filipe David Manana
> <fdmanana@apache.org>wrote:
>
>> On Sun, Jan 22, 2012 at 7:20 PM, Noah Slater <nslater@tumbolia.org> wrote:
>> > Works. How do we proceed?
>>
>> For how much time does the test runs? On 2 different physical
>> machines, it takes about 1 minute and 10 seconds for me.
>>
>> Perhaps some manual replication tests could confirm if there's
>> something wrong with the codebase, your environment or if simply
>> increasing the timeout is not alarming.
>>
>> >
>> > On Sun, Jan 22, 2012 at 7:05 PM, Noah Slater <nslater@tumbolia.org>
>> wrote:
>> >
>> >> OVAR 9000! (Testing now...)
>> >>
>> >>
>> >> On Sun, Jan 22, 2012 at 6:56 PM, Filipe David Manana <
>> fdmanana@apache.org>wrote:
>> >>
>> >>> On Sun, Jan 22, 2012 at 6:47 PM, Noah Slater <nslater@tumbolia.org>
>> >>> wrote:
>> >>> > No change, still fails.
>> >>>
>> >>> Noah, to try to find out if it's due to slowness of the machine or
>> >>> some other issue, do you think you can try to increase the following
>> >>> timeout in the test?
>> >>>
>> >>> diff --git a/test/etap/242-replication-many-leaves.t
>> >>> b/test/etap/242-replication-many-leaves.t
>> >>> index d8d3eb9..737cd31 100755
>> >>> --- a/test/etap/242-replication-many-leaves.t
>> >>> +++ b/test/etap/242-replication-many-leaves.t
>> >>> @@ -287,6 +287,6 @@ replicate(Source, Target) ->
>> >>>     receive
>> >>>     {'DOWN', MonRef, process, Pid, Reason} ->
>> >>>         etap:is(Reason, normal, "Replication finished successfully")
>> >>> -    after 300000 ->
>> >>> +    after 900000 ->
>> >>>         etap:bail("Timeout waiting for replication to finish")
>> >>>     end.
>> >>>
>> >>> >
>> >>> > On Sun, Jan 22, 2012 at 6:08 PM, Noah Slater <nslater@tumbolia.org>
>> >>> wrote:
>> >>> >
>> >>> >>
>> >>> >> On Sun, Jan 22, 2012 at 6:01 PM, Filipe David Manana <
>> >>> fdmanana@apache.org>wrote:
>> >>> >>
>> >>> >>> Noah, does it fail occasionally or every time for you?
>> >>> >>>
>> >>> >>
>> >>> >> Fails every time.
>> >>> >>
>> >>> >>
>> >>> >>> I'm assuming you're with a slow machine or the machine
is a bit
>> >>> >>> overloaded.
>> >>> >>>
>> >>> >>
>> >>> >> Shouldn't be, I'm not doing anything else right now, and this
is a
>> new
>> >>> MBA.
>> >>> >>
>> >>> >>
>> >>> >>> Can you try with the following patch?
>> >>> >>>
>> >>> >>
>> >>> >> Yes. Will report back.
>> >>> >>
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Filipe David Manana,
>> >>>
>> >>> "Reasonable men adapt themselves to the world.
>> >>>  Unreasonable men adapt the world to themselves.
>> >>>  That's why all progress depends on unreasonable men."
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Mime
View raw message