couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Elfman <elf2...@gmail.com>
Subject Re: _replicator database durability
Date Tue, 24 Sep 2013 15:47:26 GMT
Thanks, very helpfull! I'll try it.


2013/9/24 Jason Smith <jhs@apache.org>

> On Tue, Sep 24, 2013 at 10:18 PM, Alexey Elfman <elf2001@gmail.com> wrote:
> > Hello,
> >
> > I'm using CouchDB for our company's billing platform.
> >
> > We have 4 dedicated servers (32-64 GB of ram, 3-8 TB of disks with ssh
> > cache) in the same datacenter.
> > All servers serve same set of databases (about 40 databases per machine)
> > with all-to-all replications via _replicator database.
> >
> > Databases are different - from several documents to several hundreds
> > million documents. 2 databases are 500GB+. Documents are simple, without
> > complex structure and almost none attaches.
> >
> > We have application to maintain all of this replications and thats what
> for:
> > We are expecting usual unpredictable failures of replications.
> > For example, document in _replicator database can have status =
> > "triggered", but there are none tasks with such data at that moment at
> > server.
> > Or even document without "source" field for a few minutes every day at
> > every server.
> >
> > Replications crached every hours due unclear errors like "source database
> > is out of sync, please encrease max_dbs_open". max_dbs_open is 800 at
> every
> > server and databases are less than 50. So even if 50 database multiply
> to 3
> > replications is less than limit.
> >
> > Creating documents in _replicator database is hard too. Example:
> >
> >
> > # first, deleting old one
> > [Fri, 20 Sep 2013 15:41:21 GMT] [info] [<0.24052.0>] 83.240.73.210 - -
> > DELETE
> /_replicator/example.com_db?rev=10-89450b554d11bf9a6d7e15a136ae663f
> > 200
> >
> > # deleted
> > [Fri, 20 Sep 2013 15:41:24 GMT] [info] [<0.22050.0>] 83.240.73.210 - -
> GET
> > /_replicator/example.com_db?revs_info=true 404
> >
> > # creating new one with same id
> > [Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25844.0>] 176.9.143.85 - -
> HEAD
> > /_replicator/example.com_db 404
> > # seams created
> > [Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25845.0>] 176.9.143.85 - - PUT
> > /_replicator/example.com_db 201
> >
> > # where is it?..
> > [Fri, 20 Sep 2013 15:41:51 GMT] [info] [<0.22050.0>] 83.240.73.210 - -
> GET
> > /_replicator/example.com_db?revs_info=true 404
>
> I have seen this too. The only thing I can guess is that if you make a
> document but it is identical to something that was already deleted,
> then it remains deleted.
>
> Imagine a create, an update, and a delete:
>
> doc@1 -> doc@2 -> doc@3 (_deleted)
>
> Now suppose I create doc@1 again, identical to the first: every
> key/val is the same as before, so therefore _rev is identical since
> _rev is just a checksum of all the key/vals.
>
> doc@1 -> [CouchDB helpfully says "oh that has already been deleted, so
> "fast forward"] -> doc@3 (still _deleted)
>
> When you replicate doc, this is what you want (old revisions from the
> source do not magically come back to life on the target).
>
> The workaround I have found is to force a unique _rev every time. For
> me, I just added "created_at":"2013-09-24T15:28:12" in my replication
> docs. You could also use a UUID.
>
> Happily, this will not change the replication ID. The timestamp value
> is ignored. (Although maybe I could use it later as an audit or
> something.)
>
> Side note, since you seem to be serious about replicating. If you *do*
> want to change the replication ID (force a complete restart) then you
> must change either the (a) source, (b) target, (c) filter, or (d)
> query_params.
>
> Usually you cannot change (a), (b), or (c). So once again you can drop
> a timestamp or UUID into query_params. HOWEVER, query_params only
> affects the replication ID if you ALSO have a filter option.
>
> So in other words: you need a no-op filter just so that you can add
> no_op query params to force a new replication ID.
>
> function(doc, req) {
>   // A no-op filter; req.query.created_at is present but I don't care.
>   return true
> }
>
> However, once again, since you are serious about replicating and you
> already took the trouble to write a filter function, you may as well
> log stuff to help you troubleshoot later.
>
> function(doc, req) {
>   // A logging filter; req.query comes from the document .query_params
> object.
>   // In my own code, I put .source and .target in my .query_params
> object so I can log it.
>   var id_and_rev = doc._id + "@" + doc._rev
>   var source = req.query.source || '(unknown source)'
>   var target = req.query.target || '(unknown target)'
>   var dir = source + " -> " + target
>
>   log('Replicate ' + dir + ': ' + id_and_rev)
>   return true
> }
>
> >
> >
> > # next try, creating
> > [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27720.0>] 176.9.143.85 - -
> HEAD
> > /_replicator/example.com_db 404
> > # and now it created
> > [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27730.0>] 176.9.143.85 - - PUT
> > /_replicator/example.com_db 201
> >
> > # because replication starts successfully
>
> Yeah, no idea there. Once I did my created_at trick I had worked
> around this problem for myself and I moved on to other problems.
>



-- 
----------------
Best regards
Alexey Elfman
mailto:elf2001@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message