couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Kowalski <>
Subject [PROPOSAL] Improving the quality of CouchDB
Date Mon, 07 Sep 2015 14:49:03 GMT
Hi list,

I had my first real programming job in a Web Agency. We were building 
mobile websites for our external customers. The development process 
followed a strict waterfall model, after you finished your project it was 
thrown over the fence to the QA folks, which tested the website manually. 
At some point, when all found bugs were fixed, the product got released. 
Sometimes a bugfix took several iterations of going back and forth between 
QA and dev. Sometimes the QA folks forgot to test if the bug was really 
fixed after the third iteration as humans make errors, especially when it 
is a boring task.

After a lifetime of 2 years, the customer usually wanted a redesign, and 
the next developer throwed everything of the old website away, as not much 
of it was reusable. When something broke during the life of such an 
application/website, the company and developers noticed that by the 
customer calling them.

These days I think often about my time in my first job and the customers 
calling the company regarding bugs. Right now the Erlang core of CouchDB 
reminds me quite often how our customers called us when something broke. 
For CouchDB 2 it is not the users that call us right now, but PouchDB and 
Fauxton when their testsuites gets red or incredible slow.

When I was starting to work on Fauxton we were in a somehow similar 
situation, a merge usually had 2-3 fixes on the same or next day, because 
something somehow broke. A proper review for a change took incredible much 
time and every pull request needed a lot of iterations. The most evil 
changes were the ones where you fixed a bug in one area, and a completely 
different area broke. These bugs were so evil because they were surprising 
and unexpected.

After a real big issue we tried out that every change would need unit 
and/or integration tests for several weeks.

Writing unit tests was very hard these days, most of the code was not 
designed to get tested afterwards. So most of the tests written in that 
time were Selenium based integration tests. This was better than nothing 
but they were slow. Nevertheless, they caught a lot of bugs, even before 
code was sent to the review. When we made a switch to React and a different 
architecture for our components and their communication, better and easier 
unit/functional testing was one of the many design goals.

Writing good Selenium tests is incredible hard, as you basically face the 
same problems you have in the distributed systems world in the world of 
modern UIs. State and data changes asynchronously and you never know when 
the result of your request, which changes the UI is returned. It even gets 
harder by putting this on top of an eventually consistent system :)

Sometimes we also pair-programmed in a video call in front of a test where 
coming up with a good test was the problem, as we also had to share and 
gain knowledge in the team how to write tests. One mistake we made was of 
not fixing flaky tests immediately and adding more and more flaky tests and 
just restarting the CI for a few weeks. At some point our CI system just 
collapsed and it took the team a long time to fix those, partly also 
because tests were a new topic and it is hard to understand why Selenium 
tests fail in an async JS application with a cluster beneath it which is 
eventually consistent. Additionally it decreased the confidence in a red 
tests signaling a bug.

After fixing the testsuite in this early crisis we've got pretty stable 
tests. Additionally reviews take much less time. Reviews got even faster 
when we added a pre-check for our coding guidelines to the testsuite. If 
you take a look at a day in the time when we began writing tests, we 
sometimes lost an hour to write those tests, because it was hard. Looking 
just at this short timeframe it felt like we were losing time. But if we 
take a look at a longer timeframe we did not decrease in our speed to 
deliver features for several reasons: less bugfixes needed after code got 
on master, less tickets and communication because of less bugs, faster 
reviews because many bugs are caught before the reviewer takes a look and 
so on. Some people in the team still hate writing tests but also everyone 
on the team realized and also mentions that we have no other options.

Right now I am seeing the same pattern we observed some time ago in Fauxton 
for CouchDB's Erlang core :

 - not noticing something broke (if you break it, will you even notice?)
 - eventually a user in the broader sense notices something broke
 - a lot of patching to fix patches where stuff broke in another place
 - regressions
 - it takes time to review code properly because of a large api surface and 
a lot of functionality

I would like to propose that every PR needs unit, functional AND/OR 
integration tests for CouchDB's Erlang core, too - given it is not covered 
by an existing test that run. Maybe we could try the new approach for 3 
months and then decide if we keep it.

Most of CouchDB's bugs are getting noticed by interacting with the HTTP 
API. A simple integration test that CouchDB compiles and the bug got fixed 
/ the feature works would be enough as a minimum.

Let's take a look what we have for testing now:

- Erlang unit tests
- Erlang functional tests
- integration tests written in Erlang, which use ibrowse [1]

- integration tests in JavaScript which test the backdoor ports

- Fauxton's Integration tests simulating a user and their browsers
- PouchDB's integration tests

So there are a lot of different ways to write tests. One problem for 
testing is our multi-repository approach. The multiple PRs over different 
repos must somehow get united again. Here is how we could solve it as a 
first step [2]. The testsuite is kicked off once all PRs are opened by 
putting the branchnames for the repositories that should not be master into 
the config fields. 

When the testsuite is green another button could appear, to automate the 
merge of all repos involved int he change, but that would we an additional 
feature [3]. 

The reviewer and the submitter of the PR would be both responsible that a 
merge is covered by tests and the testsuite is green.

Important is that it is a first step and depending how creative we get it 
can indefinitely improve, like with Fauxton where a lot changed in the very 
short timeframe since we are requiring tests. The important step is that we 
somehow start at some point, and everybody is welcome and invited (and 
should feel empowered) to make the processes how we write software and 
tests better.

Flaky tests are a problem and should get fixed immediately, as an 
unreliable testsuite does not make much sense (remember the crying wolf kid 
story). We learned that quite painfully in Fauxton.

The change will need some discipline and maybe also stepping temporarily 
out of our comfort zone, but I am certain we will gain speed, improve our 
daily workflows and improve CouchDB's Quality over the long term. I think a 
3 month timeframe to try the new approach would be a good time to find out 
if we are improving or things get worse.

Robert :)

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message