couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <j...@couchbase.com>
Subject Re: decentralized chat
Date Sat, 24 Nov 2012 17:52:49 GMT

On Nov 23, 2012, at 7:14 PM, Kragen Javier Sitaker <kragen@canonical.org> wrote:

> Hi.  I'm new to CouchDB, but I was just chatting with Noah on IRC about
> how IRC sucks and we need to replace it and whether we could do that
> using CouchDB.

This is a topic near & dear to my heart, and in fact this sort of thinking is what led
me to CouchDB in the first place. The cool thing is that if you generalize a bit you can hit
a much wider target than just IRC/chat, so you can build other apps like microblogging or
forums on the same architecture.

> Noah suggested](http://swhack.com/logs/2012-11-24#T02-25-22) making
> each chat line a new document, using continuous replication, creating a
> mapreduce view using datetime as a key…

Yeah, sounds reasonable. The number of documents does get large over time, though. Eventually
one might consolidate an entire batch of chat messages into a single document, like on a daily
basis.

> * I'm behind NAT.  If you're behind NAT too, how can we set up
>  continuous replication between our CouchDB instances?  Is there STUN
>  support for CouchDB replication yet?

AFAIK, NAT-busting techniques like STUN only work reliably for UDP, and CouchDB uses TCP.

Many (most?) NATs support protocols for requesting a public listening port, such as NAT-PMP
or UPNP, and Apple’s Bonjour APIs even include a high-level API for accessing this.

There’s still the problem of discovery, though: having opened a port, how do I tell you
its address? This is difficult, and it’s the kind of thing every P2P system has to deal
with.

> * Do I need to create a new CouchDB database for every chat room?  Is
>  there any problem with having 20 or 30 databases on my netbook talking
>  at once?

Not really. Continuous replication will keep a socket open per database, as well as opening
more temporarily during replication, but that’s no problem until you get to tens of thousands
of dbs.

> * How about if I want to have a single Comet connection from my browser
>  to all of them at once?  (Browsers won't let you have 30 Comet
>  connections to localhost.)


That isn’t going to scale for a browser-based app since you’ll have to open a _changes
feed per database. Unless you do something special on the server side like multiplexing all
the feeds.

But really, I think you’d need a bunch of custom code in the database server anyway to handle
stuff like discovery. If you want this to really be P2P it’s not going to be feasible as
a CouchApp.

> * Are there security concerns I need to think about?  

Oh my yes!

> Like, how do I
>  make it so that I can update my DHTML UI, and maybe even automatically
>  get updates from someone else, but not *everybody* I chat with can
>  update my DHTML UI to a version that spies on my chats?  

Use the browser for the local UI if you want (personally, I’d write a native app instead,
using TouchDB) but don’t have it in charge of anything to do with security. Validation should
happen at the server level.

> What are the security properties of the replication protocol?  

Generally it uses a combination of SSL and HTTP auth. The latter is useless for a P2P system
(it only authenticates one party). SSL can be used to authenticate both peers if you use client
certs, but it requires that either (a) you have a central authority that issues certs and
binds them to meaningful identities; or (b) you set up another P2P system for replicating
known certs and webs of trust.

Even if you’ve got that, you still have the problem that this authenticates the _servers_,
not the _documents_. In a mesh network servers are going to be relaying documents, so the
peer is going to be sending you documents it didn’t create but is merely forwarding on.
That means you’d have to trust every server that’s ever been in your network to not forge
or tamper with messages, which is unrealistic.

Instead you have to sign the messages. The great thing about that is that you don’t have
to trust the replication/transmission protocol at all. You could replicate by accepting floppy
disks handed to you by shadow figures on street corners, and you’d still be able to trust
whether a message were authentic or not. But you now have a key-distribution problem — you
need a public-key infrastructure (PKI) as a trusted source of people’s keys. Otherwise if
you receive a signed message from Kim you have no idea whether the key that signed it actually
belongs to Kim or not.

> What if I need to kick someone out of a chat channel because they're spamming it?

You need an authorization mechanism. For example, you could have a document with a special
ID like “member_list” that maps people’s IDs (e.g. public keys) to privileges. To kick
someone you edit the document to remove their ID. To control who can edit the member list,
you define an ‘admin’ privilege and set up a validation function that refuses updates
that aren’t signed by someone with pre-existing admin privileges.

As you can see I’ve though about this a lot :) but I haven’t really ever written everything
down in one place, or made a coherent design document. I should do that…

—Jens
Mime
View raw message