couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Question about document copies & replication
Date Wed, 11 Nov 2009 05:52:54 GMT
On Tue, Nov 10, 2009 at 8:19 PM, David Nolen <dnolen.lists@gmail.com> wrote:
> Ok,
>
> My mind is being blown by CouchDB :D
>
> So I've realized that having a few databases per user is a really great idea
> if you decide to scale by decentralizing your content (clients do the heavy
> lifting by running queries on their local couchdb instances - since you're
> replicating to the client you don't really care that a lot of data is
> getting copied around). Every user has their own view of the world, and the
> server CouchDB instance is really only for dealing with content shared
> between users.

David,

I've definitely considered this mode of operation. I think the
simplest way to model a Twitter-like message distribution is to have
multiple databases for each user as well as a global firehose.

Let's assume the user is browsing against a CouchDB running on their laptop.

To publish a message, the user saves a document to a "publish"
database. At replication time this is pushed to the global firehose.

Users pull from the firehose(s) using filtered replication to only
copy docs authored by people they're interested in.

The user can also maintain an inbox in the cloud. Eg the host of the
global firehose (or other hosts) can maintain a database that people
can write to but not read from. Then as a user I can send direct
messages to other people's inboxes, which they will see at replication
time.

The user can also maintain a public replication of the "publish"
database (unmerged from the firehose), which could include just
remarks written by the user, but optionally include other messages the
user has seen and saved in the publish db.

>
> In our application (http://shiftspace.org), I'm thinking something along
> these lines:
>
> Client's laptop CouchDB instance:
> user/private - all the documents a user has created plus replicated content
> from groups and whatnot
> user/public - all the user's public content
> user/inbox - short messages
>

Your layout make sense.

I'd make the private database it's own db, and make another db for
replicated content. I'm thinking the private data is gonna be
important things like medical records and stuff, so I don't want to
just mix it with everything else.

You also want to make sure there is a publicly write-able inbox
database for each user in the cloud, so users can send each other
direct messages.

> Server CouchDB instance:
> group/x - group dbs of shared content. Replicated downstream to individual
> user/private who belong to the group
> master - user/public dbs replicated upstream to here.
>
> So my question is this. When a user publishes a document, it is written to
> user/private. If the user publishes a document to the world, we make a copy
> of it in user/public - it's just the same data minus the _rev field.
> Whenever a user updates a public document, we update the user/private copy
> as well as the the user/public copy which will be replicated upstream to the
> server.

This is why the user should save to the publish db, (eg instead of the
"drafts" db), and let replication send the publish db to the firehose.
Then it becomes clear that the private db is only for data I want to
avoid replicating except very carefully.

>
> So my question for the CouchDB gurus, will creating copies of documents in
> this manner create potential problems?
>

Technically you can do what you're shooting for, but it might be
better to use replication instead of saving to multiple dbs. I've been
thinking the replicator deserves an option to specify an array of
docids to replicate, which could be useful in this application.

Glad to help,

Chris

> Thanks much,
> David
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message