couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: What's our Why?
Date Wed, 24 Jul 2013 21:36:06 GMT
I have a dream…

(pardon the plagiarism)

I want to live in a world where people are empowered to understand
and are capable to decide where their data lives. I want to live in
a world where developers build apps that support that, not because
they went out of their way to implement it, but because it is a
feature of the software platform they are using.

I want to be able to help people improve their lives in regions of
the world where ubiquitous network access isn’t — and sometimes that
is just a major western capital’s subway — but more likely is it a
lesser developed location, or a rural area that will never see mobile
broadband, let alone wired broadband because there is no financial
incentive.

I want to live in a world where technology solves more problems than
it creates. One of those ways is allow people to use software wherever
they are in whatever context they need it in. More often than not,
that means far away from fast network access (Despite what @dhh is
trying to tell you).

My primary motivation for working on Apache CouchDB is to help build
the world I want to live in. The same motivation drives my motivation
behind Hoodie (http://hood.ie), which builds on top of CouchDB and
wouldn’t be possible without it.


* * *

In the past year I have interviewed a fair number of people, let’s
say 50, from those who have heard about CouchDB to users to core devs.

The ONE feature that makes CouchDB relevant is multi-master replication.
There is no exception, this is the ONE thing that makes CouchDB
exceptional. NOBODY else has that, and even the decent proprietary
solutions that are just coming to market suck where we KICK ASS.

There are many other things that people like about CouchDB: reliability,
no schema, HTTP interface, the view system, etc. But NONE of these people
would care if CouchDB didn’t have multi-master replication.


* * *

The number one thing that people did NOT like about CouchDB is that it
is confused. CouchDB has a torn identity, half database, half
application server. It wasn’t clear (and I am part responsible for this)
what CouchDB is and wants to be. In everybody’s defence, I think, it
just took a while to figure it out. Now is a good time to put our
findings in writing and fix this.

The number one request from people was to clear up CouchDB’s story,
to have a clear, bold vision that captures people and that they can
easily understand and share and support and move forward.


* * *

Here is a narrative about what CouchDB has, that has formed in my head
in the past year. I have shared this with some people privately for some
feedback and they all liked it, so it has that going for it. I also tried
out bringing some of these issues up in presentations I have given, to
again great feedback.

E.g.:

  http://www.youtube.com/watch?v=7mdG-iAizVc or
  http://www.youtube.com/watch?v=edbi9jJZkpg

Before I lay it out, I understand that I will be ruffling some feathers.
I think that is both necessary and healthy. I think the picture I am
going to paint will make a lot of people in the CouchDB community happy,
some with concessions, but I utterly and strongly believe that this
vision of what CouchDB is has the power to set the course for the next
five years of the project and attract a whole lot of new people both
as users and contributors.



* * *

CouchDB is a database that replicates.

Think of it as git for your data-layer. Not in a sense where you manage
text files and diff and merge, but in the sense that you have a local
version of your data and one or multiple remote ones and you can
seamlessly move your data between them, back and forth and crossover.

Imagine a local checkout of your data that you can work on, and then
share it with Lucie across the table, she finds some issues and fixes
up the data, and shares it with Tim across the room. Tim fixes two
more issues and you pull both their changes into your copy. We conclude
the whole thing is golden and we push it to staging, where our continuous
integration runs and decides that the data is good to go into production,
so it pushes it to production. There the data is picked up from various
clients, some mobile over there, some web over here, a backup system
in the Tokyo office…

Or you have hospitals in remote regions in Africa that collect local
health data, like how many malaria infections a region has and they all
share their results over unreliable mobile connections and the data
still makes it eventually maybe with a few hours delay and the malaria
expert in the capital city sees an increased outbreak of some illness
and is able to send out medicine in time to arrive for the patients
to help. Where today the expert takes months to travel between the
hospitals to collect that data manually and find out that there was
a lethal outbreak two months ago and everybody died.

(Somebody built this, CouchDB does save lives, I get teary every time
 I tell this story (like now). Our work doesn’t get more noble than
 this.)

Or imagine millions of mobile users with access to terabytes of
data in the cloud, replicating the bits they need to their phones
and tablets, allowing super-fast low-latency access for a stellar
user experience, while giving access to sheer amounts of data and
allowing full write access on the mobile device to be replicated
back to the cloud when connections exist.

(Our friends at Cloudant have a couple of those customers.)


That is the power of CouchDB.


* * *

Replication is the PRIMARY feature of CouchDB. “is a database” means
“stores your data, safely and securely”, “that replicates” highlights
the primary feature.

There are many more very cool features of CouchDB, even the details
on how we achieve reliability and data safety or how replication
works are mindblowingly cool. The simple HTTP interface, the JSON
store, the app-server features, map reduce views, all very excellent
things that make CouchDB unique, but it is very important to understand
that they are SECONDARY features.


* * *

I want to learn from understanding what the PRIMARY and SECONDARY
features for CouchDB are. I already feel a bit bad about that the
PRIMARY ones are two (“a database” *and* “that replicates”), but I
think that is as little as it gets.

I want CouchDB’s new identity to be a database that replicates. I want
to provide a slide deck for a “CouchDB in 25 minutes” presentation* that
everybody can take and give and customise, but I want that one of the
first things you say “CouchDB is a database that replicates”. I want
that if you ask anyone inside the CouchDB developer community (you!)
about what CouchDB is to answer “CouchDB is a database that replicates”
and then follow up explaining what we mean, and *then* add a few more
of the SECONDARY features that you particularly like.

* https://dl.dropboxusercontent.com/u/82149/CouchDB-in-25-Minutes.pdf
  Full talk at: http://vimeo.com/62599420 (sorry this one is German,
  still trying to find an English version of this)

I want that people who barely look at CouchDB comment on an unrelated
Hacker News thread write “…CouchDB is a database that replicates, maybe
that is a better fit for your problem”.

I want that the CTO of the newly funded startup thinks “I seem to have
a replication problem to solve, maybe CouchDB can help.”

I want to move CouchDB’s development forward, and when we ask ourselves
whether to add a feature, we run it by our PRIMARY feature set and ask
“does it support ‘CouchDB is a database that replicates’” and if it does
we go ahead and build it, and if it doesn’t we may consider it as a
SECONDARY feature, or we discard it altogether.

(I don’t actually care what the final slogan will be, and please bike-shed
 this to no avail, but it should capture what I mean with “CouchDB is a
 database that replicates”, a phrase that we can burn into everybody’s
 head that captures CouchDB’s PRIMARY feature, its PRIMARY value
 proposition, the ONE thing that explains WHY we are excited about
 CouchDB.)


* * *

Now, you might be miffed that your pet feature didn’t make the PRIMARY list.
Do not worry, I believe I have a solution for that.

I have brought this up before, but I really do think the holy grail to all
this is a very well done plugin system that allows us to follow the “small 
core, massive plugin repository” paradigm that other’s ever so successfully
pioneered.

This allows us to focus on what CouchDB is for internal and external
communication, for roadmap discussions and attraction of developer talent.

More importantly, it allows us to keep all the fringe things that makes
CouchDB so very appealing to a lot of different people. It also allows us
to open up development to people who feel intimidated working on core
CouchDB, but can easily write a little plugin or three (this is basically
me, I have like 20 branches on GitHub that are useful to maybe 5% of our
users and they don’t get used any).

A wise person once said “Core is where features go to rot.”, and if you
look at a number of CouchDB features, you can see that we suffer from that.

We need a kick-ass plugin system that allows us to easily create, publish,
maintain and update little pieces of code that allow our users to make
their CouchDB their own. (I am signing up to build that, but I will need
your help, there is a shit ton of work to do :)


* * *

ALERT: OPINION (your opinion may differ and we need to hear it)

There is a discussion we need to have what the “small core” means for
CouchDB. There is a discrepancy between the absolute minimum to fulfil
the “CouchDB is a database that replicates“ mantra and what would be
a useful-out-of-the-box product that our users could set up and be
productive with.

My minimum set looks roughly like this:

 - core database management (crud dbs & json/mime-docs, clustering)
 - remote & local replication
 - MR-views & GeoCouch enabled by default (ideally abstracted
   away with nice “query dsl”)
 - HTTP interface
 - Fu/Fauxton
 - configuration
 - stats
 - docs
 - plugin system with Erlang (and in the future JavaScript support
   via Node.js)

This makes for a useful CouchDB default setup.

Everything else should be a plugin. A piece of code that can be installed
with a quick search and a click of a button in Futon (or a `curl`-call on
the HTTP interface). Not far away, definitely not “siberia” (if you get
the PHP reference), but close to the core and encouraged to be used.

And yes, this explicitly includes things like shows and lists and update
functions and rewrites and vhosts. We should make it super simple to add
these, but for a default experience, they are very, very confusing. We
should have a single plugin “CouchApp Engine” which includes Benoit’s
vision of CouchApps done right that is just a click away to install.

In terms of highlighting the strengths of the core CouchDB “product”, this
is what I’d put on the website:

  - Apache CouchDB implements the CouchDB vision:
    It is a database that replicates.

  - Document Database:
    - Data records are standard JSON.
    - Unlimited Binary data storage with attachments.
    - (alternatively arbitrary mime docs with special rules for JSON docs)

  - Fault-tolerant:
    - Data is always safe. Tail-append storage ensures no messing with
      already committed data.
    - Errors are isolated, recovery is local and doesn’t affect other
      parallel requests.
    - Recovery of fatal errors is immediate. There is no “fixup phase”
      after a restart.
    - Software updates and bugfix deployment without downtime.

  - Highly Concurrent:
    - Erlang makes good use of massively parallel network server
      installations.
    - Garbage collection happens roughly on a per-request basis.
      GC in one request doesn’t affect other requests.

  - Cluster / BigCouch / Big Data:
    - Includes a Dynamo-style clustering and cluster-management
      feature that allows to spread data and load over multiple
      physical machines.
    - Scales up to Petabytes of data.

  - Secondary 2D and 3D indexing
    - Using incremental and asynchronous index updates for
      high-performance queries.

  - Makes good use of hardware:
    - Tail-append storage allows for serial write access to 
      storage media, which is a best-case-scenario for spinning
      disks and SSDs.

  - Small Core & Flexible Plugin System:
    - Some features are only useful for a small group of people, these 
      can be installed with a super simple plugin management system that
      is built into the admin interface.
    - Get new features with a click or tap.
    - Plugins can be written in Erlang (and in JavaScript in the future).

  - Cross Platform Support
    - Runs on any POSIX UNIX as well as Windows.
    - Support for some embedded devices like Android and RaspberryPi.


I think this would make for a compelling list of technical features.

(I’d probably also add a blip about the ASF and the Apache 2.0 License
 for good measure)

ALERT END


* * *

And then, CouchDB is one more thing. CouchDB isn’t just the Erlang
implementation of this whole replicating database idea. CouchDB is also
the wire protocol, the specification that makes all the magic work.
Apache CouchDB is the focal point for The Replicating Society*.

(* cue your Blade Runner jokes)

Apache CouchDB is THE standard for data freedom and exchange and is
the clearing house, the centre for an ecosystem that includes fantastic
projects like PouchDB and the TouchDBs, MAx Ogden’s `dat` and whichever
else follow these. Not saying we merge those projects in, they can stand
on their own, but we should embrace everything that makes the
interoperable replication world a reality.

http://couchdb.apache.org is going to be the centre of the data
replication universe.


* * *

Now all of this is my vision and I bringing it to this table now.
I have to admit that I am very nervous about this. A lot of things
aren’t very well thought out and at the same time, I care very
deeply about this project and it’s community and their future, so
there is a little anxiety doing this little emotional striptease
in front of all of you.

What we will end up with, is not what I dream up and that’s that,
but I hope I can inform and set the direction of where we are going,
and then we can all together figure out the hard parts, and question
my assumptions and change little thing or lots.

I don’t want to make this mine, but ours. To keep and to be proud of.

The last thing I want is to stifle diversity, in thought and code,
and I am very sure that some of you will find a lot to disagree with
what I am saying, and that’s great, because this should, again, be
ours, not mine.

But the one thing I am convinced of is the little pivot that this
project hinges on* between relative obscurity and blasting success
is that we need to find our version of a simplified, streamlined
and aligned way of defining, building and communicating what Apache
CouchDB is.

(* I suck at metaphors)

And yes that means that some thing that *YOU* think are important
are getting a second row seat instead of the front row. Heck even
some of my pet features get a second row seat, but that is fine
because they aren’t gone, there is still room for all the crazy
and not-so-crazy-but-not-essential stuff that people love in the
plugin system, one click away. All this so we can benefit from
being able to focus on building a modern, compelling, fun, humble
and clever database that we can build the future, our future, on.


* * *

I want to live in a world where people are empowered to understand
and are capable to decide where their data lives.


I want to live in a world where technology solves more problems than
it creates.


My primary motivation for working on Apache CouchDB is to help build
the world I want to live in.


The ONE feature that makes CouchDB relevant is multi-master replication.


I want to learn from understanding what the PRIMARY and SECONDARY
features for CouchDB are.


Apache CouchDB is the focal point for The Replicating Society.


I don’t want to make this mine, but ours. To keep and to be proud of.


* * *


CouchDB is a database that replicates.

I’m excited about your feedback! <3

Sincerely,
Jan
--






Thanks to Noah for kicking off this way overdue discussion.


On Jul 24, 2013, at 15:28 , Noah Slater <nslater@apache.org> wrote:

> Okay, here are some rough thoughts.
> 
> Why?
> 
> - We believe that distributed data should be easy
> 
> How?
> 
> - Painless multi-master replication
> - Effortless clustering and sharding
> - Co-location of data, queries, and views
> - Deep browser and platform integration
> - Built of the Web
> 
> What?
> 
> - Erlang
> - HTTP
> - JSON
> - JavaScript
> - MapReduce
> 
> (That last list could go on, and on, and on...)
> 
> Anyway. This is just a rough sketch of the sort of hierarchy I am thinking
> about.
> 
> Whatever this ends up looking like, I think this is how we should talk
> about CouchDB. This structure could be a template for anything. A talk, a
> sales pitch, the homepage itself. The important thing is that we start from
> "why?" and we build up from foundations.
> 
> 
> On 24 July 2013 13:15, Noah Slater <nslater@apache.org> wrote:
> 
>> I'm trying to imagine what our "I have a dream" speech would be like for
>> CouchDB. If we were the Wright brothers, we might stand up and say "I have
>> a dream that one day man will fly." We might say, "I have a dream that
>> distributed data will be easy." (I mean, that about covers it, right?
>> Doesn't have to be complex. The hard part is making sure we actually focus
>> in on the root dream we all have.)
>> 
>> Jan mentioned a few months ago that CouchDB almost wants to be the Git,
>> for databases. What is Git? What would Git's "dream" be? I can imagine
>> Linus saying "I have a dream that distributed version control will be
>> easy." Same sorta thing, right?
>> 
>> 
>> On 24 July 2013 13:06, Noah Slater <nslater@apache.org> wrote:
>> 
>>> Benoit,
>>> 
>>> You should defo watch that video and see what you think. Note that it
>>> does not matter if we are a company. This insight applies to companies,
>>> products, loose groups of people working towards one thing (like the Wright
>>> brothers) and even individuals. (i.e. What is your personal "why" and how
>>> are the things you are doing working towards that.)
>>> 
>>> I also want to put you at ease by saying that having a single shared
>>> "why" doesn't mean that anybody's vision, or personal goals have to be left
>>> by the wayside. People can still come to the project with their own goals,
>>> and their own perspective. But the project itself should have a clear sense
>>> of what we are trying to accomplish.
>>> 
>>> I think the "why" we come up with can easily be something that inspires
>>> and is important to the Hoodie peeps, the Kanso peeps, the CouchApp peeps,
>>> the "big data" peeps, the mobile platform peeps. Think about a why that
>>> might evolve out of "your data, everywhere". Who (in our existing
>>> communities) wouldn't love that and want to rally behind that? (But this is
>>> just one idea.)
>>> 
>>> Asking "what are the core features" misses the point. Why are these core
>>> features? Why did we add them in the first place? What are we working
>>> towards? See, you hit on it in your final sentence: "relax we take care
>>> about your data and the way you exchange and render them wherever they
>>> are". This! This is the kind of thing that I think we should hone, and
>>> figure out, and document.
>>> 
>>> Once we have that, it can inform our "how". When we're talking about
>>> features, about product direction (i.e. what we add, what we subtract) we
>>> can say "well, how is this related to what we're trying to do here?" Do you
>>> see what I mean? :)
>>> 
>>> "Painless distributed systems" is also a step in the right direction for
>>> answering the question "why?"
>>> 
>>> So far we have:
>>> 
>>>    * Relax
>>>    * Decentralised web
>>>    * Peer-to-peer replication of apps and datasets
>>>    * Your data, everywhere
>>>    * Put the data where you need it
>>>    * We handle your data / you handle display
>>>    * Painless distributed systems
>>> 
>>> Somewhere in here ^ (and perhaps in a follow up reply) is a single shared
>>> value system. Something we all hold dear.
>>> 
>>> 
>>> 
>>> 
>>> On 24 July 2013 12:48, Benoit Chesneau <bchesneau@gmail.com> wrote:
>>> 
>>>> Anyway, CouchDB is not like apple or dell. This isn't a company. And we
>>>> don't have to share all the same vision, but only common values, a core.
>>>> I'm not sure it enter in the what you describe. What kind of vision are
>>>> you
>>>> speaking about?
>>>> 
>>>> Also I would remove any pro-tip from your mail if we want to start from a
>>>> neutral base.
>>>> 
>>>> Couchdb is known for the replication but not only. Couchapps and the way
>>>> people hack around is another (hoodie, kanso, erica/ couchapp all
>>>> differents visions of what is a couchapp but all are using couchdb the
>>>> same_.. Message hub is another (nodejistsu, hoodie are using couchdb as a
>>>> message hub somehow, not only but a lot of their arch is based on
>>>> changes).
>>>> And now we we can add some kind of big data handling. Not forgetting
>>>> people
>>>> that are using apache couchdb on their mobile, they exists and the
>>>> patches
>>>> will be release.
>>>> 
>>>> All have different visions. But they share some common features. I don't
>>>> want to forget someone because of a vision of some. I only know that
>>>> couchdb has some strong features that could be improved.
>>>> 
>>>> All that to say that rather than thinking to a vision, maybe we could
>>>> collect all the usages around and see what emerges from it. What are the
>>>> core features, What couchdb should focus on and itterrate depending on
>>>> the
>>>> new usage. I guess it's some kind of philosophy: "relax we take care
>>>> about
>>>> your data and the way you exchange and render them wherever they are".
>>>> 
>>>> - benoit
>>>> 
>>>> 
>>>> On Wed, Jul 24, 2013 at 1:24 PM, Noah Slater <nslater@apache.org> wrote:
>>>> 
>>>>> Hi devs,
>>>>> 
>>>>> I came across this video recently:
>>>>> 
>>>>> Simon Sinek: How great leaders inspire action
>>>>> 
>>>> http://www.ted.com/talks/simon_sinek_how_great_leaders_inspire_action.html
>>>>> 
>>>>> In it he sets out what he calls the Golden Circle:
>>>>> 
>>>>> Why
>>>>> 
>>>>>    - What's your purpose?
>>>>>    - What's your cause?
>>>>>    - What's your belief?
>>>>> 
>>>>> How
>>>>> 
>>>>>    - How do we do it?
>>>>>    - How does our product differentiate?
>>>>>    - How are we different?
>>>>>    - How are we better?
>>>>> 
>>>>> What
>>>>> 
>>>>>    - What do we do?
>>>>>    - What do we make?
>>>>> 
>>>>> He points out that the difference between companies like Apple and
>>>>> companies like Dell.
>>>>> 
>>>>> Dell tells you what they do, and how. "We make great computers. They're
>>>>> well designed and work well. Wanna buy a computer?" Most companies do
>>>> it
>>>>> like this. But they often miss out the "why".
>>>>> 
>>>>> But then you look at Apple, and they do it the other way around. Apple
>>>> tell
>>>>> you what their purpose is. The rest is almost an afterthought. "We
>>>> believe
>>>>> in challenging the status quo. We believe in thinking different. We do
>>>> that
>>>>> with great design and a focus on the user experience. We just happen
to
>>>>> make computers." He then joking quips: "Ready to buy one yet?"
>>>>> 
>>>>> (His talk gives several other examples, with his thesis being that
>>>> telling
>>>>> your story from the outside in is what separates all the great
>>>> companies
>>>>> and leaders. One of his main examples is the Wright brothers.)
>>>>> 
>>>>> He comments that if you talk about what you believe, you will attract
>>>> those
>>>>> that believe what you believe. That when you talk about what you
>>>> believe,
>>>>> people will join you for their own reasons, for their own purpose. And
>>>> that
>>>>> what you do simply serves as proof of what you believe. Or as he quips:
>>>>> "Martin Luther King gave his 'I have a dream' speech, not his 'i have
a
>>>>> plan' speech."
>>>>> 
>>>>> Why am I bringing this to the dev list?
>>>>> 
>>>>> Because our message stinks. "Apache CouchDB™ is a database that uses
>>>> JSON
>>>>> for documents, JavaScript for MapReduce queries, and regular HTTP for
>>>> an
>>>>> API" is a terrible way to introduce who we are, what we stand for, and
>>>> why
>>>>> we build this thing. (And I'm allowed to say all that, because I'm the
>>>> one
>>>>> who wrote it, with lots of help from Jan.)
>>>>> 
>>>>> So what am I proposing? I'm proposing that we figure out our why. That
>>>> we
>>>>> figure out what we stand for, what we believe in. And then we figure
>>>> out
>>>>> how we're gonna do that (pro tip: replication is more important than
>>>> the
>>>>> data format we use). Not only will this define a consistent internal
>>>> vision
>>>>> for the project (what *are* we working towards anyway?) but it will
>>>> help us
>>>>> to attract people who believe in what we believe.
>>>>> 
>>>>> So, if you have any thoughts about this, speak up!
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> --
>>>>> NS
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> NS
>>> 
>> 
>> 
>> 
>> --
>> NS
>> 
> 
> 
> 
> -- 
> NS


Mime
View raw message