couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dirkjan Ochtman <dirk...@ochtman.nl>
Subject Replication problems
Date Fri, 01 Oct 2010 13:19:30 GMT
Hi there,

I've been doing some replication today, and it's been problematic.

I have two servers running CouchDB 1.0.1: one is the main server in
the office, the other is the backup server in a rack somewhere in a
different city. Both servers are running Linux, and we have OpenVPN to
encrypt connections between them. We recently made some changes to our
main database (we deleted some 350000 documents from our main
database, we now have 750000 left). We have continuous replication
running on the local server pulling from the remote server for two
auxiliary databases that we have, and it seems to mostly work fine.

However, replication initiated on the remote server from the local
server is proving problematic. I've been seeing errors like this:

** Reason for termination ==
** {http_request_failed,<<"failed to replicate http://10.8.0.12:5984/watt/">>}
[Fri, 01 Oct 2010 13:02:56 GMT] [error] [<0.12908.142>] {error_report,<0.32.0>,
    {<0.12908.142>,crash_report,
     [[{initial_call,{couch_rep_changes_feed,init,['Argument__1']}},
       {pid,<0.12908.142>},
       {registered_name,[]},
       {error_info,
           {exit,
               {http_request_failed,
                   <<"failed to replicate http://10.8.0.12:5984/watt/">>},
               [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}},
       {ancestors,
           [<0.12903.142>,couch_rep_sup,couch_primary_services,
            couch_server_sup,<0.33.0>]},
       {messages,
           [{ibrowse_async_response,
                {1285,937623,492504},
                {error,closing_on_request}},
            {'EXIT',<0.12909.142>,normal}]},
       {links,[]},
       {dictionary,[{timeout,{1285938140888213,#Ref<0.0.417.114136>}}]},
       {trap_exit,true},
       {status,running},
       {heap_size,75025},
       {stack_size,24},
       {reductions,22198607}],
      []]}}

I've been starting the replication on the remote, it will work for a
few thousand (or tens of thousand) seqs, and then it will error out
again. This behavior is not new: I've been trying to set up continuous
replication for this database for a few weeks now, and it always just
bombs out after a while. Which is especially weird given that it seems
to work fine the other way around (though those databases are much
smaller and see fewer changes, on the order of 0.5M vs. 10M seqs).

Is this a known problem? Anything we can do to find the cause?

Cheers,

Dirkjan

Mime
View raw message