couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Multiple database backup strategy
Date Sun, 20 Mar 2016 14:54:36 GMT
I’ll never berate anyone for top-posting (or bottom-posting for that matter). I just follow
suit with whatever the current thread is doing — in this, very very clearly top-posting
;)

Thank you for making this distinction clear. Personally I was only ever interested in the
first case. Scoping the replicator manager to only learn about _replicator docs on the local
cluster through internal APIs is a smart move — I wasn’t suggesting anything different
there. I do think we should have an efficient way for a client to learn about the existence
of new updates to an arbitrary number of databases on a remote cluster using a single socket.

Adam

> On Mar 20, 2016, at 10:45 AM, Robert Samuel Newson <rnewson@apache.org> wrote:
> 
> Final note, we've conflated two uses of /_db_updates that I want to be very clear on;
> 
> 1) using /_db_updates to detect active source databases of a replication job.
> 2) using /_db_updates to hear about new/updated/deleted _replicator documents.
> 
> It was the 2nd case where the unreliability was a concern, since the update frequency
is very low, one expects, for _replicator databases.
> 
>> On 20 Mar 2016, at 14:44, Robert Samuel Newson <rnewson@apache.org> wrote:
>> 
>> (I swear I'll stop soon...)
>> 
>> Using /_db_updates as a cheap mechanism to detect activity at the source for any
database we're interested in is an important optimization. We didn't discuss it this past
week as we felt that /_db_updates wasn't sufficiently reliable. We can save a lot of churn
in the scheduler by simply not resuming any job unless we have seen an update to the source
database.
>> 
>> B.
>> 
>>> On 20 Mar 2016, at 14:36, Robert Samuel Newson <rnewson@apache.org> wrote:
>>> 
>>> I missed a point in Adam's earlier post.
>>> 
>>> The current scheme uses couch_event for runtime changes to _replicator docs but
has to read all updates of all _replicator databases at startup. In the steady state it is
just receiving couch_event notifications. The /_db_updates option would change that only slightly
(we'd read /_db_updates from 0 to find all _replicator databases, rather than reading the
changes feed for the node-local 'dbs' database).
>>> 
>>> CouchDB itself has a single /_replicator database, of course, but the code will
consider any database to be a /_replicator database if the name ends that way. i.e, today,
if you made a database called foo/_replicator it would be considered a /_replicator database
by the system (and we'd inject the ddoc, etc).
>>> 
>>> B.
>>> 
>>>> On 20 Mar 2016, at 14:31, Robert Samuel Newson <rnewson@apache.org>
wrote:
>>>> 
>>>> Since I'm typing anyway, and haven't yet been dinged for top-posting, I wanted
to mention one other optimization we had in mind.
>>>> 
>>>> Currently each replicator job has its own connection pool. When we introduce
the notion that we can stop and restart jobs, those become approximately useless. So we will
obvious hoist that 'up' to a higher level and manage connection pools at the manager level.
>>>> 
>>>> One optimization that seems obvious from the Cloudant perspective is to allow
reuse of connections to the same destinations even though they are ostensibly for different
domains. That is, a connection to rnewson.cloudant.com is ultimately a connection to lbX.jenever.cloudant.com.
This connection could just as easily be used for any other user in the jenever cluster. Thus,
if it's idle, we could borrow that connection rather than create a new one.
>>>> 
>>>>> host rnewson.cloudant.com
>>>> rnewson.cloudant.com is an alias for jenever.cloudant.com.
>>>> jenever.cloudant.com is an alias for lb2.jenever.cloudant.com.
>>>> lb2.jenever.cloudant.com has address 5.153.0.207
>>>> 
>>>> Rather than add rnewson.cloudant.com > 5.153.0.207 to the pool, we would
add lb2.jenever.cloudant.com -> 5.153.0.207 and resolve rnewson.cloudant.com to its ultimate
CNAME before consulting the pool.
>>>> 
>>>> Does this optimization help elsewhere than Cloudant?
>>>> 
>>>>> On 20 Mar 2016, at 14:22, Robert Samuel Newson <rnewson@apache.org>
wrote:
>>>>> 
>>>>> My point is that we can (and currently do) trigger the replication manager
on receipt of the database updated event, so it avoids all of the other parts of the sequence
you describe which could fail.
>>>>> 
>>>>> The obvious difference, and I suspect this is what motivates Adam's position,
is that _db_updates can be called remotely. A solution using /_db_updates as its feed can
run somewhere else, it wouldn't even need to be a couchdb cluster. With the current 2.0 scheme,
the _replicator db has to live on the nodes performing replication management (and therefore
it depends on couch_{btree,file} etc). That's a huge incentive to go the /_db_updates route
and it would serve as a model for others like pouchdb that cannot choose to co-locate.
>>>>> 
>>>>> One side-benefit we get from using database updated events from the _replicator
shards, though, is that it helps us determine which node will run any particular job. We allocate
a job to the lowest live erlang node that hosts the document. If we go with /_db_updates,
we'll need some other scheme. That's not a bad thing (indeed, it could be a very good thing),
but it would need more thought. While in Seattle we did discuss both directions at some length
and believe we'd need some form of leader election system, the leader would then assign (and
rebalance) replication jobs across the erlang cluster. I pointed at a proof-of-concept implementation
of an algorithm I trust that I wrote a while back at https://github.com/cloudant/sobhuza as
a possible starting point.
>>>>> 
>>>>> B.
>>>>> 
>>>>> P.S. I'm using Mail.app and simply replying where it sticks the cursor
(at the top), but in other forums I've been berated for top-posting. Should I modify my reply
style here?
>>>>> 
>>>>> On 19 Mar 2016, at 21:42, Benjamin Bastian <bbastian@apache.org>
wrote:
>>>>>> 
>>>>>> When a shard is updated, it'll trigger a "database updated" event.
CouchDB
>>>>>> will hold those updates in memory for a configurable amount of time
in
>>>>>> order to dedupe updates. It'll then cast lists of updated databases
to
>>>>>> nodes which host the relevant _db_updates shards for further deduplication.
>>>>>> It's only at that point that the updates are persisted. Only a single
>>>>>> update needs to reach the _db_updates DB. IIRC, _db_updates triggers
up to
>>>>>> n^3 (assuming the _db_updates DB and the updated DB have the same
N), so it
>>>>>> may be a bit tricky for all of them to fail. You'd need coordinated
node
>>>>>> failure. Perhaps something like datacenter power loss. Another possible
>>>>>> issue is if all the nodes which host a shard range of the _db_updates
DB
>>>>>> are unreachable by the nodes which host a shard range of any other
DB. Even
>>>>>> if it was momentary, it'd cause messages to be dropped from the _db_updates
>>>>>> feed.
>>>>>> 
>>>>>> For n=3 DBs, it seems like it'd be difficult for all of those things
to go
>>>>>> wrong (except perhaps in the case of power loss or catastrophic network
>>>>>> failure). For n=1 DBs, you'd simply need to reboot a node soon after
an
>>>>>> update.
>>>>>> 
>>>>>> On Sat, Mar 19, 2016 at 1:31 PM, Adam Kocoloski <kocolosk@apache.org>
wrote:
>>>>>> 
>>>>>>> Hi Bob, comments inline:
>>>>>>> 
>>>>>>>> On Mar 19, 2016, at 2:36 PM, Robert Samuel Newson <rnewson@apache.org>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> The problem is that _db_updates is not guaranteed to see
every update,
>>>>>>> so I think it falls at the first hurdle.
>>>>>>> 
>>>>>>> Do you mean to say that a listener of _db_updates is not guaranteed
to see
>>>>>>> every updated *database*? I think it would be helpful for the
discussion to
>>>>>>> describe the scenario in which an updated database permanently
fails to
>>>>>>> show up in the feed. My recollection is that it’s quite byzantine.
>>>>>>> 
>>>>>>>> What couch_replicator_manager does in couchdb 2.0 (though
not in the
>>>>>>> version that Cloudant originally contributed) is to us ecouch_event,
notice
>>>>>>> which are to _replicator shards, and trigger management work
from that.
>>>>>>> 
>>>>>>> Did you mean to say “couch_event”? I assume so. You’re
describing how the
>>>>>>> replicator manager discovers new replication jobs, not how the
jobs
>>>>>>> discover new updates to source databases specified by replication
jobs.
>>>>>>> Seems orthogonal to me unless I missed something.
>>>>>>> 
>>>>>>>> Some work I'm embarking on, with a few other devs here at
Cloudant, is
>>>>>>> to enhance the replicator manager to not run all jobs at once
and it is
>>>>>>> indeed the plan to have each of those jobs run for a while, kill
them (they
>>>>>>> checkpoint then close all resources) and reschedule them later.
It's TBD
>>>>>>> whether we'd always strip feed=continuous from those. We _could_
let each
>>>>>>> job run to completion (i.e, caught up to the source db as of
the start of
>>>>>>> the replication job) but I think we have to be a bit smarter
and allow
>>>>>>> replication jobs that constantly have work to do (i.e, the source
db is
>>>>>>> always busy), to run as they run today, with feed=continuous,
unless
>>>>>>> forcibly ousted by a scheduler due to some configuration concurrency
>>>>>>> setting.
>>>>>>> 
>>>>>>> So I think this is really the crux of the issue. My contention
is that
>>>>>>> permanently occupying a socket for each continuous replication
with the
>>>>>>> same source and mediator is needlessly expensive, and that _db_updates
>>>>>>> could be an elegant replacement.
>>>>>>> 
>>>>>>>> I note  for completeness that the work we're planning explicitly
>>>>>>> includes "multi database" strategies, you'll hopefully be able
to make a
>>>>>>> single _replicator doc that represents your entire intention
(e.g,
>>>>>>> "replicate _all_ dbs from server1 to server2”).
>>>>>>> 
>>>>>>> Nice! It’ll be good to hear more about that design as it evolves,
>>>>>>> particularly in aspects like discovery of newly created source
databases
>>>>>>> and reporting of 403s and other fatal errors.
>>>>>>> 
>>>>>>> Adam
>>>>> 
>>>> 
>>> 
>> 
> 


Mime
View raw message