incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <...@apache.org>
Subject Re: _replicator database durability
Date Tue, 24 Sep 2013 15:42:20 GMT
On Tue, Sep 24, 2013 at 10:18 PM, Alexey Elfman <elf2001@gmail.com> wrote:
> Hello,
>
> I'm using CouchDB for our company's billing platform.
>
> We have 4 dedicated servers (32-64 GB of ram, 3-8 TB of disks with ssh
> cache) in the same datacenter.
> All servers serve same set of databases (about 40 databases per machine)
> with all-to-all replications via _replicator database.
>
> Databases are different - from several documents to several hundreds
> million documents. 2 databases are 500GB+. Documents are simple, without
> complex structure and almost none attaches.
>
> We have application to maintain all of this replications and thats what for:
> We are expecting usual unpredictable failures of replications.
> For example, document in _replicator database can have status =
> "triggered", but there are none tasks with such data at that moment at
> server.
> Or even document without "source" field for a few minutes every day at
> every server.
>
> Replications crached every hours due unclear errors like "source database
> is out of sync, please encrease max_dbs_open". max_dbs_open is 800 at every
> server and databases are less than 50. So even if 50 database multiply to 3
> replications is less than limit.
>
> Creating documents in _replicator database is hard too. Example:
>
>
> # first, deleting old one
> [Fri, 20 Sep 2013 15:41:21 GMT] [info] [<0.24052.0>] 83.240.73.210 - -
> DELETE /_replicator/example.com_db?rev=10-89450b554d11bf9a6d7e15a136ae663f
> 200
>
> # deleted
> [Fri, 20 Sep 2013 15:41:24 GMT] [info] [<0.22050.0>] 83.240.73.210 - - GET
> /_replicator/example.com_db?revs_info=true 404
>
> # creating new one with same id
> [Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25844.0>] 176.9.143.85 - - HEAD
> /_replicator/example.com_db 404
> # seams created
> [Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25845.0>] 176.9.143.85 - - PUT
> /_replicator/example.com_db 201
>
> # where is it?..
> [Fri, 20 Sep 2013 15:41:51 GMT] [info] [<0.22050.0>] 83.240.73.210 - - GET
> /_replicator/example.com_db?revs_info=true 404

I have seen this too. The only thing I can guess is that if you make a
document but it is identical to something that was already deleted,
then it remains deleted.

Imagine a create, an update, and a delete:

doc@1 -> doc@2 -> doc@3 (_deleted)

Now suppose I create doc@1 again, identical to the first: every
key/val is the same as before, so therefore _rev is identical since
_rev is just a checksum of all the key/vals.

doc@1 -> [CouchDB helpfully says "oh that has already been deleted, so
"fast forward"] -> doc@3 (still _deleted)

When you replicate doc, this is what you want (old revisions from the
source do not magically come back to life on the target).

The workaround I have found is to force a unique _rev every time. For
me, I just added "created_at":"2013-09-24T15:28:12" in my replication
docs. You could also use a UUID.

Happily, this will not change the replication ID. The timestamp value
is ignored. (Although maybe I could use it later as an audit or
something.)

Side note, since you seem to be serious about replicating. If you *do*
want to change the replication ID (force a complete restart) then you
must change either the (a) source, (b) target, (c) filter, or (d)
query_params.

Usually you cannot change (a), (b), or (c). So once again you can drop
a timestamp or UUID into query_params. HOWEVER, query_params only
affects the replication ID if you ALSO have a filter option.

So in other words: you need a no-op filter just so that you can add
no_op query params to force a new replication ID.

function(doc, req) {
  // A no-op filter; req.query.created_at is present but I don't care.
  return true
}

However, once again, since you are serious about replicating and you
already took the trouble to write a filter function, you may as well
log stuff to help you troubleshoot later.

function(doc, req) {
  // A logging filter; req.query comes from the document .query_params object.
  // In my own code, I put .source and .target in my .query_params
object so I can log it.
  var id_and_rev = doc._id + "@" + doc._rev
  var source = req.query.source || '(unknown source)'
  var target = req.query.target || '(unknown target)'
  var dir = source + " -> " + target

  log('Replicate ' + dir + ': ' + id_and_rev)
  return true
}

>
>
> # next try, creating
> [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27720.0>] 176.9.143.85 - - HEAD
> /_replicator/example.com_db 404
> # and now it created
> [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27730.0>] 176.9.143.85 - - PUT
> /_replicator/example.com_db 201
>
> # because replication starts successfully

Yeah, no idea there. Once I did my created_at trick I had worked
around this problem for myself and I moved on to other problems.

Mime
View raw message