couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Couchdb Wiki] Update of "The_CouchDB_Vision" by NoahSlater
Date Mon, 29 Jul 2013 21:06:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "The_CouchDB_Vision" page has been changed by NoahSlater:

  "The number one request from people was to clear up CouchDB’s story, to have a clear,
bold vision that captures people and that they can easily understand and share and support
and move forward."
  "Before I lay it out, I understand that I will be ruffling some feathers. I think that is
both necessary and healthy. I think the picture I am going to paint will make a lot of people
in the CouchDB community happy, some with concessions, but I utterly and strongly believe
that this vision of what CouchDB is has the power to set the course for the next five years
of the project and attract a whole lot of new people both as users and contributors."
+ -
+ I want to learn from understanding what the PRIMARY and SECONDARY features for CouchDB are.
I already feel a bit bad about that the PRIMARY ones are two (“a database” *and* “that
replicates”), but I think that is as little as it gets.
+ I want CouchDB’s new identity to be a database that replicates. I want to provide a slide
deck for a “CouchDB in 25 minutes” presentation* that everybody can take and give and
customise, but I want that one of the first things you say “CouchDB is a database that replicates”.
I want that if you ask anyone inside the CouchDB developer community (you!) about what CouchDB
is to answer “CouchDB is a database that replicates” and then follow up explaining what
we mean, and *then* add a few more of the SECONDARY features that you particularly like.
+ (Noah's commentary: how does this play with the idea that everything we do should stem from
our "why". why are we building a database that replicates? what's our vision? what do we stand
for? i think both models are compatible. our existing approach is to say "what" couchdb is.
jan's suggestion is to start with "how", and then get to "what". i am suggesting that we add
another one on top of that, and start with "why". then say "how", and then say "what". i don't
think these are incompatible. apple might have "challenge the status quo" as a "why", but
it's marketing can still lead with the one sentence "how" in the same vein as "couchdb is
a database that replicates". some thinking / discussion to do here. i think it will depend
on context. homepage, talk, etc, etc. even jan's talk that was linked starts, essentially,
with the "why. his "why" is listed as "i <3 the web", "i <3 reliable web infrastructure"!
so jan is already doing this in his talks! so maybe this is a template. "we/i love X. we believe
in Y. [BEAT] which is why i hack on couchdb. it's a database that replicates". voila! @@ these
needs bringing back to this list, or working into a questions/issues section)
+ @@ go through
+ I want that people who barely look at CouchDB comment on an unrelated Hacker News thread
write “…CouchDB is a database that replicates, maybe that is a better fit for your problem”.
+ I want that the CTO of the newly funded startup thinks “I seem to have a replication problem
to solve, maybe CouchDB can help.”
+ I want to move CouchDB’s development forward, and when we ask ourselves whether to add
a feature, we run it by our PRIMARY feature set and ask “does it support ‘CouchDB is a
database that replicates’” and if it does we go ahead and build it, and if it doesn’t
we may consider it as a SECONDARY feature, or we discard it altogether.
+ (Noah's commentary: again, i'm gonna go one step higher than this. and i'm gonna suggest
that we also ask ourselves "how does this help us work towards our why?"
+ (I don’t actually care what the final slogan will be, and please bike-shed
+  this to no avail, but it should capture what I mean with “CouchDB is a
+  database that replicates”, a phrase that we can burn into everybody’s
+  head that captures CouchDB’s PRIMARY feature, its PRIMARY value
+  proposition, the ONE thing that explains WHY we are excited about
+  CouchDB.)
+ [comments on plugin system elided]
  == Why? ==
@@ -103, +130 @@

  "There are many other things that people like about CouchDB: reliability, no schema, HTTP
interface, the view system, etc. But NONE of these people would care if CouchDB didn’t have
multi-master replication."
+ CouchDB is a database that replicates.
+ Think of it as git for your data-layer. Not in a sense where you manage text files and diff
and merge, but in the sense that you have a local version of your data and one or multiple
remote ones and you can seamlessly move your data between them, back and forth and crossover.
+ Imagine a local checkout of your data that you can work on, and then share it with Lucie
across the table, she finds some issues and fixes up the data, and shares it with Tim across
the room. Tim fixes two more issues and you pull both their changes into your copy. We conclude
the whole thing is golden and we push it to staging, where our continuous integration runs
and decides that the data is good to go into production, so it pushes it to production. There
the data is picked up from various clients, some mobile over there, some web over here, a
backup system in the Tokyo office…
+ Or you have hospitals in remote regions in Africa that collect local health data, like how
many malaria infections a region has and they all share their results over unreliable mobile
connections and the data still makes it eventually maybe with a few hours delay and the malaria
expert in the capital city sees an increased outbreak of some illness and is able to send
out medicine in time to arrive for the patients to help. Where today the expert takes months
to travel between the hospitals to collect that data manually and find out that there was
a lethal outbreak two months ago and everybody died.
+ (Somebody built this, CouchDB does save lives, I get teary every time I tell this story
(like now). Our work doesn’t get more noble than this.)
+ Or imagine millions of mobile users with access to terabytes of data in the cloud, replicating
the bits they need to their phones and tablets, allowing super-fast low-latency access for
a stellar user experience, while giving access to sheer amounts of data and allowing full
write access on the mobile device to be replicated back to the cloud when connections exist.
+ (Our friends at Cloudant have a couple of those customers.)
+ That is the power of CouchDB.
+ -
+ Replication is the PRIMARY feature of CouchDB. “is a database” means “stores your
data, safely and securely”, “that replicates” highlights the primary feature.
+ do these bits belong here or in previous section? - There are many more very cool features
of CouchDB, even the details on how we achieve reliability and data safety or how replication
works are mindblowingly cool. The simple HTTP interface, the JSON store, the app-server features,
map reduce views, all very excellent things that make CouchDB unique, but it is very important
to understand that they are SECONDARY features.
+ (@@ does this bit go into the "what" bit? need to research difference. think we can lead
with replication as the primary feature, but include it in the "what"?)
+ -
+ @@ where does this bit go? should it even be included? might be worth punting the whole
"couch-like" stuff to a separate doc, and only referencing it from this vision statement?
+ And then, CouchDB is one more thing. CouchDB isn’t just the Erlang implementation of this
whole replicating database idea. CouchDB is also the wire protocol, the specification that
makes all the magic work. Apache CouchDB is the focal point for The Replicating Society*.
+ (* cue your Blade Runner jokes)
+ Apache CouchDB is THE standard for data freedom and exchange and is the clearing house,
the centre for an ecosystem that includes fantastic projects like PouchDB and the TouchDBs,
MAx Ogden’s `dat` and whichever else follow these. Not saying we merge those projects in,
they can stand on their own, but we should embrace everything that makes the interoperable
replication world a reality.
+ is going to be the centre of the data replication universe.
+ (Noah's commentary: I think we should call this "Couch" and capitalise on the "-DB" less
prefix that people have used elsewhere. this should be a reclamation effort on our part, to
own, and define what a "couch-like" system is. this needs further discussion on the list.)
  == What? ==
   * What do we do?
@@ -121, +186 @@

   * couchapp
   * Message hub (nodejistsu, hoodie are using couchdb as a message hub somehow)
+ Jan outlines his idea of a "core":
+  * remote & local replication
+  * MR-views & GeoCouch enabled by default (ideally abstracted away with nice “query
+  * HTTP interface
+  * Fu/Fauxton
+  * configuration
+  * stats
+  * docs
+  * plugin system with Erlang (and in the future JavaScript support via Node.js)
+ Also:
+  * plugin system
+ Note also:
+ "And yes, this explicitly includes things like shows and lists and update functions and
rewrites and vhosts. We should make it super simple to add these, but for a default experience,
they are very, very confusing. We should have a single plugin “CouchApp Engine” which
includes Benoit’s vision of CouchApps done right that is just a click away to install."
+ Jan lays out our "specs":
+  * Apache CouchDB implements the CouchDB vision: It is a database that replicates.
+  * Document Database:
+    * Data records are standard JSON.
+    * Unlimited Binary data storage with attachments.
+    * (alternatively arbitrary mime docs with special rules for JSON docs)
+  * Fault-tolerant:
+    * Data is always safe. Tail-append storage ensures no messing with already committed
+    * Errors are isolated, recovery is local and doesn’t affect other parallel requests.
+    * Recovery of fatal errors is immediate. There is no “fixup phase” after a restart.
+    * Software updates and bugfix deployment without downtime.
+  * Highly Concurrent:
+    * Erlang makes good use of massively parallel network server installations.
+    * Garbage collection happens roughly on a per-request basis. GC in one request doesn’t
affect other requests.
+  * Cluster / BigCouch / Big Data:
+    * Includes a Dynamo-style clustering and cluster-management feature that allows to spread
data and load over multiple physical machines.
+    * Scales up to Petabytes of data.
+  * Secondary 2D and 3D indexing
+    * Using incremental and asynchronous index updates for high-performance queries.
+  * Makes good use of hardware:
+    * Tail-append storage allows for serial write access to storage media, which is a best-case-scenario
for spinning disks and SSDs.
+  * Small Core & Flexible Plugin System:
+    * Some features are only useful for a small group of people, these can be installed with
a super simple plugin management system that is built into the admin interface.
+    * Get new features with a click or tap.
+    * Plugins can be written in Erlang (and in JavaScript in the future).
+  * Cross Platform Support
+    * Runs on any POSIX UNIX as well as Windows.
+    * Support for some embedded devices like Android and RaspberryPi.

View raw message