couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject CouchDB and Kubernetes
Date Sat, 30 Apr 2016 02:55:00 GMT
Hi all,

I’ve doing a bit of poking around the container orchestration space lately and looking at
how we might best deploy a CouchDB 2.0 cluster in a container environment. In general I’ve
been pretty impressed with the design point of the Kubernetes project, and I wanted to see
how hard it would be to put together a proof of concept.

As a preamble, I needed to put together a container image for 2.0 that just runs a single
Erlang VM instead of the container-local “dev cluster”. You can find that work here:

https://github.com/klaemo/docker-couchdb/pull/52 <https://github.com/klaemo/docker-couchdb/pull/52>

So far, so good - now for Kubernetes itself. My goal was to figure out how to deploy a collection
of “Pods” that could discover one another and self-assemble into a cluster. Kubernetes
differs from the traditional Docker network model in that every Pod gets an IP address that
is routable from all other Pods in the cluster. As a result there’s no need for some of
the port gymnastics that one might encounter with other Docker environments - each CouchDB
pod can listen on 5984, 4369 and whatever distribution port you like on its own IP.

What you don’t get with Pods is a hostname that’s discoverable from other Pods in the
cluster. A “Service” (a replicated, load-balanced collection of Pods) can optionally have
a DNS name, but the Pods themselves do not. This throws a wrench in the most common distributed
Erlang setup, where each node gets a name like “couchdb@FQDN” and the FQDNs are resolvable
to IP addresses via DNS.

It is certainly possible to specify an Erlang node name like “couchdb@12.34.56.78 <mailto:couchdb@12.34.56.78>”,
but we need to be a bit careful here. CouchDB is currently forcing the Erlang node name to
do “double-duty”; it’s both the way that the nodes in a cluster figure out how to route
traffic to one another and it’s the identifier for nodes to claim ownership over individual
replicas of database shards in the shard map. Speaking from experience it’s often quite
useful operationally to remap a given Erlang node name to a new server and have the new server
be automatically populated with the replicas it’s supposed to own. If we use the Pod IP
in Kubernetes for the node name we won’t have that luxury.

I think the best path forward here would be to extend the “Node" concept in a CouchDB cluster
so that it has an identifier which is allowed to be distinct fro the Erlang node name. The
“CouchDB Node” is the one that owns database shard replicas, and it can be remapped to
different distributed Erlang nodes over time via modification of an attribute in the _nodes
DB.

Hope you all found this useful — I’m quite interested in finding way to make it easier
for users to acquire a highly-available cluster configured in the “right way”, and I think
projects like Kubernetes have a lot of promise in this regard. Cheers,

Adam
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message