couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reddy B. <>
Subject CouchDb Rewrite/Fork
Date Tue, 09 Jul 2019 23:07:40 GMT
Hi all,

I've checked the recent discussions and apparently July is the "vision month" lol. Hopefully
this email will not saturate the patience of the core team.

We have been thinking about forking/rewriting CouchDb internally for quite some time now,
and this idea has reached a degree of maturity such that I'm pretty confident it will materialize
at this point. We hesitated between doing our thing internally to then make our big open-sourcing
announcement 5-10 years from now when the product is battle tested, and announcing our intentions
here today.

However, I realized that good things may happen by providing this feedback, and that providing
this type of feedback also is a way of giving back to the community.

The reason for this project is that we have lost confidence in the way the vision of CouchDb
aligns with our goals. As far as we are concerned, there are 3 things we loved with CouchDb:


We think that the benefits of Map/Reduce are very underrated. Map/reduce forces developpers
to approach problems differently and results in much more efficient and well-thought of  application
architectures and implementations. This is in addition to the performance benefits since indexes
are built in advance in a very predictable manner (with a few well-documented caveats). For
this reason, our developers are forbidden from using Mango, and we require them to wrap their
head around problems until they are able to solve them in map/reduce mode.

However, we can see that the focus of the CouchDb project is increasingly on Mango, and we
have little confidence in the commitment of the project to first-class citizen Map/Reduce
support (while this was for us a defining aspect of the identity of CouchDb).

#Complexity of the codebase

An open-source software that is too complex to be tweaked and hacked is for all practical
purposes closed-source software. You guys are VERY smart. And by nature a database software
system is a non-trivial piece of technology.

Initially we felt confident that the codebase was small enough and clean enough that should
we really need to get our hands dirty in an emergency situation, we would be able to do so.
Then Mango made the situation a bit blurrier, but we could easily ignore that, especially
since we do not use it. However with FoundationDB... this becomes a whole different story.

The domain model of a database is non-trivial by nature, and now FoundationDb will introduce
an additional level of abstraction and indirection, and a very serious one. I've been reading
the design discussions since the FoundationDb announcement and there are a lot of impedance
mistmatches requiring the domain model of CouchDb to be broken up in fictious entities intended
to accomodate FoundationDb abstractions and their limitations (I'll back to this point in
a moment).

Indirection is also introduced at the business logic level, with additional steps needing
to be followed to emulate the desired behavior. All of this is complexity and obfuscation,
and to be realistic, if we already struggled with the straight-to-the-point implementation,
there is no way we'll be able to navigate (let alone hack), the FoundationDB-based implementation.

#(Apparent) Non-Alignment of FoundationDb with the reasons that made us love CouchDb

FoundationDb introduces limitations regarding transactions, document sizes and another number
of critical items. One of the main reasons we use CouchDb is because of the way it allows
us to develop applications rapidly and flexibly address all the state storage needs of application
layers. CouchDb has you covered if you just want to dump large media file streamed with HTTP
range requests while you iterate fast and your userbase is small, and replication allows you
to seemless scale by distributing load on clusters in advanced ways without needing to redesign
your applications. The user nkosi23 nicely describes some of the new possibilities enabled
by CouchDb:

However, the limitations introduced by FoundationDb and the spirit of their project favoring
abstraction purity through aggressive constraints, over operational flexibility is the opposite
of the reasons we loved CouchDb and believed in it. It is to us pretty clear that the writing
is on the wall. We aren't confident in FoundationDb to cover our bases, since covering our
bases is explicitly not the goal of their project and their spirit is different from what
has made CouchDb unique (ease of use, simple yet powerful and flexible abstractions etc...).

#Lack of commitment to the ideas pioneered

We feel like Couchdb itself undervalues the wealth of what it has brought to the table. For
example when it comes to architecting load balancing for all sorts of applications with a
single and transparent value store, CouchDb enables things that simply weren't possible before,
and people will need time to understand how they can take advantage of them.

Nowadays we can see sed, awk and such be used in pretty clever ways, but it took time for
people to incorporate the possibilities enabled by these tools in their thinking process (even
though system administration are much easier to deploy than enterprise applications).

I think that CouchDb should have a 10 or 20-year outlook on the paradigm shifts its introduces,
there is a need to give more place to faith and less place to data since not every usage will
be adopted within 3 years. Sometimes you need to do things because you believe in them and
you know you are right and that eventually people will come. But right now, it feels like
customer statistics from Cloudant have become the main driver of the project. A balanced probably
can be found between aligning with business realities and evangelism realities. I feel IBM
guys are totally right to share their insights, but if there are not faith-zealots to counter-balance,
then a positive may become a negative.

#What we plan to do

For all these reasons, CouchDb 3 will likely be the last release we will use. What we are
about to activate is an effort to rewrite CouchDb to focus on the use case that we think makes
CouchDb unique: a one-stop shop for all data storage needs, no matter the type of application
and load. This means focusing on, on the one hand on working seamlessly with extremely large
attachments and documents of any size, and on the other hand replication features (which goes
hand in hand).

We will also seek to resurrect old features such as list views that we think need long-term
faith. To make it possible from a bandwidth perspective, we will make a number of radical
decisions. The two most important ones may be the following:

- Only map/reduce will be supported. Far from a limitation we see this as a way of life and
a different way of thinking about designing line of business applications. Our finding is
that a line of business applications never needs SQL style flexibility for the main app is
the problem space has been correctly modeled (instead of being Excel in the web browser).
When Business Analytics are really needed, the need is always very localized, and it is nowadays
easy enough to have an ETL pipeline on a separate instance (especially considering CouchDb
filtered replication capabilities).
- Rewrite CouchDb in FSharp.

Rewriting in Fsharp will provide all the benefits of functional programming, while giving
us access to a rich ecosystem of libraries, and a great static type checking system. All of
this will mean more time to focus on the core features.

This is in a gist pretty much the plan. This is still early stages, and the way we do things,
we would typically roll it out internally for a number of years before announcing it to the
public. So I think there will likely be a 10-yearish window before you hear about this again.

I simply wanted to provide our feedback as a friendly contribution.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message