incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Elfman <elf2...@gmail.com>
Subject _replicator database durability
Date Tue, 24 Sep 2013 15:18:45 GMT
Hello,

I'm using CouchDB for our company's billing platform.

We have 4 dedicated servers (32-64 GB of ram, 3-8 TB of disks with ssh
cache) in the same datacenter.
All servers serve same set of databases (about 40 databases per machine)
with all-to-all replications via _replicator database.

Databases are different - from several documents to several hundreds
million documents. 2 databases are 500GB+. Documents are simple, without
complex structure and almost none attaches.

We have application to maintain all of this replications and thats what for:
We are expecting usual unpredictable failures of replications.
For example, document in _replicator database can have status =
"triggered", but there are none tasks with such data at that moment at
server.
Or even document without "source" field for a few minutes every day at
every server.

Replications crached every hours due unclear errors like "source database
is out of sync, please encrease max_dbs_open". max_dbs_open is 800 at every
server and databases are less than 50. So even if 50 database multiply to 3
replications is less than limit.

Creating documents in _replicator database is hard too. Example:


# first, deleting old one
[Fri, 20 Sep 2013 15:41:21 GMT] [info] [<0.24052.0>] 83.240.73.210 - -
DELETE /_replicator/example.com_db?rev=10-89450b554d11bf9a6d7e15a136ae663f
200

# deleted
[Fri, 20 Sep 2013 15:41:24 GMT] [info] [<0.22050.0>] 83.240.73.210 - - GET
/_replicator/example.com_db?revs_info=true 404

# creating new one with same id
[Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25844.0>] 176.9.143.85 - - HEAD
/_replicator/example.com_db 404
# seams created
[Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25845.0>] 176.9.143.85 - - PUT
/_replicator/example.com_db 201

# where is it?..
[Fri, 20 Sep 2013 15:41:51 GMT] [info] [<0.22050.0>] 83.240.73.210 - - GET
/_replicator/example.com_db?revs_info=true 404


# next try, creating
[Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27720.0>] 176.9.143.85 - - HEAD
/_replicator/example.com_db 404
# and now it created
[Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27730.0>] 176.9.143.85 - - PUT
/_replicator/example.com_db 201

# because replication starts successfully
[Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.111.0>] Attempting to start
replication `c0071dc985cbc3df3a225a6d75f0be7b+continuous` (document
`example.com_db`).
[Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.28092.0>] Document
`example.com_db` triggered replication
`c0071dc985cbc3df3a225a6d75f0be7b+continuous`
[Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.28090.0>] starting new
replication `c0071dc985cbc3df3a225a6d75f0be7b+continuous` at <0.28092.0>
(`http://replica:*****@example.com:5984/db/` -> `db`)

Does someone uses couchdb in the similar manner as we are? Am I only
experiencing such problems?

P.S.  We are using couchdb 1.3.1 and 1.4.0 with Gentoo Linux.

-- 
----------------
Best regards
Alexey Elfman
mailto:elf2001@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message