couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reddy B. <redd...@live.fr>
Subject Re: CouchDb Rewrite/Fork
Date Sat, 13 Jul 2019 01:59:43 GMT
Hi Jan,

Thank you so much for your comprehensice reply, I have also read your parallel remarks in
the IoT thread which also helped understand the current vision better.

I totally understand your points, there are certain aspects that we may have overlooked, misunderstood
or not looked into with enough details.

Overlooked with regards to the fact that there may be a will but technical/performance challenges.
Misunderstood with regards to the background story with dropped features, and not looked into
with enough details with regards to what would be the best path to help move things forward.

At the very least, what you made clear is that the situation is less black and white and more
nuanced than it may seem and I am thankful to you for taking the time to explain it.

We remain strongly curious about exploring what would be the best course of action to address
our fears and needs, so I think what would be a productive course of action for us would be
to on the one hand study the current implementation in more depth, and on the other hand,
prototype with our ideas to see the actual value this would bring to the table when practical
costs and challenges are factored in (in addition to checking if this is a realistic path).

As a remark, the reason I said F# is not to suggest that this is the best language, or to
start the classic holy war that one language would solve every problem, or that it is time
to rewrite everything with something trendier. If we go ahead we would use F# because we are
a Microsoft shop and we are both very familiar and very invested with the .NET ecosystem.
I mentionned F# to echo our maintenance/tweaking concerns (but I probably should have mentionned
that we are a Microsoft shop to make this point clearer - you had no way to guess it from
my message).

In a gist, the reasoning is: if we are to undertake such a huge effort, let's also bring things
to an environment with which we are familiar, especially since adequate tooling exists there.
This is for internal reasons, not for absolutism / flavor of the month reasons. I mentioned
it to provide you a data point on the technology stacks of your users, but once again I went
too fast on that part.

We have definitely no desire to undertake such an effort if it wouldn't be productive, or
if there are better ways to address our needs and concerns while at the same time contributing.
We are not interested in getting into the database business, we are only looking to hedge
our risk and rip productivity dividends as a bonus.

Now the thing is we are curious people and funding isn't necessarily a problem for us. And
compared to most companies we have a strong appetite for reinventing the wheel if it makes
us more efficient and/or less exposed to risk. So I think what we'll be doing is get our hands
dirtier, and then review everything in light of your very important remarks to see what would
be the most productive course of actions to invest our efforts, both for the community and
for us (considering our needs, but also our comfort zone and realistic abilities).

Because that's really the data point I tried to convey. My message was less about "I think
the project is wrongly headed" (which is not what I think, at most I wonder if it is still
aligned with our goals and Jan's message was very helpful with this regard), than it was about
saying: "here are our limitations, and fears, and the compromise we are exploring to address
them. This is purely related to where we come from and not to absolutism but I feel this data
point may be useful to the project".

Thanks very much for your answers, i think we'll be busy for a little while exploring options
in more depth and this information will be very valuable to help us find the best compromise
/ way to invest our ressources.

________________________________
De : Jan Lehnardt <jan@apache.org>
Envoyé : jeudi 11 juillet 2019 11:26
À : dev@couchdb.apache.org
Objet : Re: CouchDb Rewrite/Fork

Hi Reddy,

this is all pretty good feedback, thanks for taking the time to put this down.


> On 10. Jul 2019, at 01:07, Reddy B. <reddy.b@live.fr> wrote:
>
> Hi all,
>
> I've checked the recent discussions and apparently July is the "vision month" lol. Hopefully
this email will not saturate the patience of the core team.

I’m announcing my 2030-plan to rewrite CouchDB in bash ;D

> We have been thinking about forking/rewriting CouchDb internally for quite some time
now, and this idea has reached a degree of maturity such that I'm pretty confident it will
materialize at this point. We hesitated between doing our thing internally to then make our
big open-sourcing announcement 5-10 years from now when the product is battle tested, and
announcing our intentions here today.
>
> However, I realized that good things may happen by providing this feedback, and that
providing this type of feedback also is a way of giving back to the community.
>
> The reason for this project is that we have lost confidence in the way the vision of
CouchDb aligns with our goals. As far as we are concerned, there are 3 things we loved with
CouchDb:
>
> #Map/Reduce
>
> We think that the benefits of Map/Reduce are very underrated. Map/reduce forces developpers
to approach problems differently and results in much more efficient and well-thought of  application
architectures and implementations. This is in addition to the performance benefits since indexes
are built in advance in a very predictable manner (with a few well-documented caveats). For
this reason, our developers are forbidden from using Mango, and we require them to wrap their
head around problems until they are able to solve them in map/reduce mode.
>
> However, we can see that the focus of the CouchDb project is increasingly on Mango, and
we have little confidence in the commitment of the project to first-class citizen Map/Reduce
support (while this was for us a defining aspect of the identity of CouchDb).

Aside from a TBD point for how to support custom reduces in FDB (for which
several folks have at least outlined a theoretical approach), MapReduce isn’t going anywhere.
It is the foundation of how we can ensure that Mango remains scalable to the needs we want.


> #Complexity of the codebase
>
> An open-source software that is too complex to be tweaked and hacked is for all practical
purposes closed-source software. You guys are VERY smart. And by nature a database software
system is a non-trivial piece of technology.
>
> Initially we felt confident that the codebase was small enough and clean enough that
should we really need to get our hands dirty in an emergency situation, we would be able to
do so. Then Mango made the situation a bit blurrier, but we could easily ignore that, especially
since we do not use it. However with FoundationDB... this becomes a whole different story.

I’m not sure how. Mango is literally just a self-contained module in the CouchDB module
list. I don’t want to be flippant about this, but I really don’t understand how another
folder in here makes any sort of a difference:

> ls src/
b64url                  couch_mrview            ets_lru                 jiffy            
      rexi
bear                    couch_peruser           fabric                  ken              
      setup
chttpd                  couch_plugins           fauxton                 khash            
      smoosh
config                  couch_pse_tests         folsom                  mango            
      snappy
couch                   couch_replicator        global_changes          meck             
      triq
couch_epi               couch_stats             hqueue                  mem3
couch_event             couch_tests             hyper                   mochiweb
couch_index             ddoc_cache              ibrowse                 proper
couch_log               docs                    ioq                     rebar

With very few changes, you could delete that src/mango/ directory and have
a mangoless couch.

There are some complexities in the CouchDB codebase, but most if not all are in the src/couch/
submodule that is essentially 1.x plus some new stuff, all the new modules outside of src/couch
are relatively well-defined and self-contained.

* * *

>
> The domain model of a database is non-trivial by nature, and now FoundationDb will introduce
an additional level of abstraction and indirection, and a very serious one. I've been reading
the design discussions since the FoundationDb announcement and there are a lot of impedance
mistmatches requiring the domain model of CouchDb to be broken up in fictious entities intended
to accomodate FoundationDb abstractions and their limitations (I'll back to this point in
a moment).
>
> Indirection is also introduced at the business logic level, with additional steps needing
to be followed to emulate the desired behavior. All of this is complexity and obfuscation,
and to be realistic, if we already struggled with the straight-to-the-point implementation,
there is no way we'll be able to navigate (let alone hack), the FoundationDB-based implementation.

No argument here about the additional layer and impedance mismatch. But as outlined above,
the FoundationDB change will get rid of all the code that is most gnarly in CouchDB today
and replace it with a mature software project. The modules on top are going to very lightweight,
in coparison.


>
> #(Apparent) Non-Alignment of FoundationDb with the reasons that made us love CouchDb
>
> FoundationDb introduces limitations regarding transactions, document sizes and another
number of critical items. One of the main reasons we use CouchDb is because of the way it
allows us to develop applications rapidly and flexibly address all the state storage needs
of application layers. CouchDb has you covered if you just want to dump large media file streamed
with HTTP range requests while you iterate fast and your userbase is small, and replication
allows you to seemless scale by distributing load on clusters in advanced ways without needing
to redesign your applications. The user nkosi23 nicely describes some of the new possibilities
enabled by CouchDb:
>
> https://github.com/apache/couchdb/pull/1253#issuecomment-507043600
>
> However, the limitations introduced by FoundationDb and the spirit of their project favoring
abstraction purity through aggressive constraints, over operational flexibility is the opposite
of the reasons we loved CouchDb and believed in it. It is to us pretty clear that the writing
is on the wall. We aren't confident in FoundationDb to cover our bases, since covering our
bases is explicitly not the goal of their project and their spirit is different from what
has made CouchDb unique (ease of use, simple yet powerful and flexible abstractions etc...).

As Alex points out, we can talk about all this. But this line of reasoning also conveniently
ignores another reality. Yes, while CouchDB allows you to store multi MB JSON docs, and GB’s
worth of attachments, it will require a lot of computing resources if you are making a lot
of use out of this. Disproportionally so, in comparison with other solution, e.g. an abstraction
layer that allows you to have smaller docs and that has binary storage outside of CouchDB).

On the flip-side, one of CouchDB’s goals, and main attractions is this: it grows with your
needs, you don’t have to rewrite your app as usage grows. You should be able to go from
a single-node CouchDB to a three-node cluster, to a ten-node cluster and however far your
business or project takes you.

Today, CouchDB can’t fulfil this, because it doesn’t have limits that are similar to the
FDB-imposed ones, especially around document and attachment sizes. So the question is this:
regardless of FDB, should CouchDB impose limits on resources to ensure its original vision
about scaling, or should it bend over backwards to support things that no sensible database
should support (I jest, of course)?

As my dayjob includes making people successful with their CouchDB installations, I’m happy
to continue to charge hourly for telling them to make their docs smaller, but really, I’d
like for folks to be successful on their own, because that means we’ll get more users.

I have some concerns about transaction lengths and getting consistent snapshots out of an
FDB-Couch with a one-shot _changes request, as we support today, but I hear that with the
new storage engine coming in that at least becomes a little easier to consider.

All that said, for the people who truly can’t move to a FDB-CouchDB world, we are currently
working hard on making CouchDB 3.x the absolute best it can be, so that it can be used for
a long time going. If a significant part of this community finds itself staying on 3.x, I
can promise, with enough contributions, we can even support it for a long time going forward.
But we won’t be able to, if we don’t get enough folks pitching in.

>
> #Lack of commitment to the ideas pioneered
>
> We feel like Couchdb itself undervalues the wealth of what it has brought to the table.
For example when it comes to architecting load balancing for all sorts of applications with
a single and transparent value store, CouchDb enables things that simply weren't possible
before, and people will need time to understand how they can take advantage of them.
>
> Nowadays we can see sed, awk and such be used in pretty clever ways, but it took time
for people to incorporate the possibilities enabled by these tools in their thinking process
(even though system administration are much easier to deploy than enterprise applications).

I don’t really understand what you are referring to here. What exactly did CouchDB pioneer
that we are throwing away?


> I think that CouchDb should have a 10 or 20-year outlook on the paradigm shifts its introduces,
there is a need to give more place to faith and less place to data since not every usage will
be adopted within 3 years.

The ones that we are consciously shedding have been a relative dud since ~2012 (CouchApps).
We’ve tried numerous times to rally the remaining enthusiasts around providing a modern
implementation of that, the sorry history of which you can read up on in the couchapp@ mailing
list. If there were end-users interested in this, that would lead to more folks wanting to
build and maintain a modern version of CouchApps, that we’d happily support as part of CouchDB,
but despite rallying a number of times in the past 8 years, it hasn’t come together.

I understand that some enthusiasts are extremely excited about CouchApps, trust me, I’ve
been one of them. And I agree that we pioneered a number of things that are the new normal.
I can trad straight line from CouchDB to Node.js, Docker, Kubernetes and WASM, and while it
might not all be connected perfectly, we’ve been pointing at where we are today way before
anyone else. But that also means, we didn’t end up being the ones getting big with this.
We started at a time when JS was still considered icky, not a stape of modern backend development.
We were the first JSON/HTTP database, etc. We didn’t have the resources to keep up with
that world, so we are focussing on the things we now we can achieve, and that leads to some
tough decisions that we made some time around 2014, and communicated here repeatedly. I’m
done arguing this point until someone comes with working code and commitment.


> Sometimes you need to do things because you believe in them and you know you are right
and that eventually people will come. But right now, it feels like customer statistics from
Cloudant have become the main driver of the project. A balanced probably can be found between
aligning with business realities and evangelism realities. I feel IBM guys are totally right
to share their insights, but if there are not faith-zealots to counter-balance, then a positive
may become a negative.

IBM’s influence here is undeniable, but I’m not seeing one instance of anything being
negative. My job and the folks working at Neighbourhoodie (including Wohali) is different
enough from what IBM is doing that I have high hopes that we bring important perspectives
to the project that forces us to find middle ground. I’d love more perspectives, but “let’s
please stay in 2007” is not going to work.

Best
Jan
—

>
> #What we plan to do
>
> For all these reasons, CouchDb 3 will likely be the last release we will use. What we
are about to activate is an effort to rewrite CouchDb to focus on the use case that we think
makes CouchDb unique: a one-stop shop for all data storage needs, no matter the type of application
and load. This means focusing on, on the one hand on working seamlessly with extremely large
attachments and documents of any size, and on the other hand replication features (which goes
hand in hand).
>
> We will also seek to resurrect old features such as list views that we think need long-term
faith. To make it possible from a bandwidth perspective, we will make a number of radical
decisions. The two most important ones may be the following:
>
> - Only map/reduce will be supported. Far from a limitation we see this as a way of life
and a different way of thinking about designing line of business applications. Our finding
is that a line of business applications never needs SQL style flexibility for the main app
is the problem space has been correctly modeled (instead of being Excel in the web browser).
When Business Analytics are really needed, the need is always very localized, and it is nowadays
easy enough to have an ETL pipeline on a separate instance (especially considering CouchDb
filtered replication capabilities).
> - Rewrite CouchDb in FSharp.
>
> Rewriting in Fsharp will provide all the benefits of functional programming, while giving
us access to a rich ecosystem of libraries, and a great static type checking system. All of
this will mean more time to focus on the core features.
>
> This is in a gist pretty much the plan. This is still early stages, and the way we do
things, we would typically roll it out internally for a number of years before announcing
it to the public. So I think there will likely be a 10-yearish window before you hear about
this again.
>
> I simply wanted to provide our feedback as a friendly contribution.

--
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message