couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Couchdb Wiki] Update of "ReleaseNotices" by JanLehnardt
Date Sat, 25 Feb 2012 20:09:30 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "ReleaseNotices" page has been changed by JanLehnardt:
http://wiki.apache.org/couchdb/ReleaseNotices

Comment:
new release notices page

New page:
<<Include(EditTheWiki)>>

= Release Notices =

Sometimes, after we make a release, we might find out that something is wrong with it that
is so severe that we need to tell everyone who runs that release. This page collects these
notices.

<<TableOfContents(2)>>

== 1.0.0 ==

**A 1.0.0 RECOVERY TOOL IS NOW AVAILABLE**

Download the [[http://wiki.couchone.com/page/repair-tool#/|CouchDB 1.0.0 Repair Tool]] to
recover data.


=== Notes on a Nasty Bug ===

Developers should be using 1.0.1 release only at this point; not the 1.0.0 version. Read on
to find out why.

On the weekend of August 7th–8th, 2010 we discovered and fixed a bug in CouchDB 1.0.0. The
problem was subtle (cancelling a timer, without deleting the reference to it) but the ramifications
were not: there was potential data loss for users of 1.0.0. The 1.0.1 release contains a permanent
fix, and [is available now on the download page](../downloads.html).

We are proud how quickly the CouchDB community recovered from this bug and went the extra
mile to make sure everyone's data was safe. It is clear we have a group of developers who
care enough about all users' data that it aggressively pursued an "edge case" bug so no one
would be caught off guard. Further, the team worked for the next week to create a repair tool
to recover access to data which was affected by the bug. As a result, no users lost data permanently.
Kudos!

=== The Remedy ===

For current users, these instructions will ensure your data is safe. First: **do not restart
your CouchDB!** The hot fix involves changing configuration on the running server, so have
your admin credentials handy  (if your CouchDB is in Admin Party mode with no admins defined,
you won't need admin credentials). (If you do not have admin credentials, but you can restart
the server, you can still prevent data loss. Read on.)

==== If you have admin credentials (or if your CouchDB is in Admin Party mode) ====

Visit the Futon admin console at http://yourserver:5984/_utils/, and click "Login" in the
lower right hand corner. Login as an administrator, and visit the "Configuration" page linked
in the sidebar: http://yourserver:5984/_utils/config.html

Now that you are in the configuration page, set `delayed_commits` (in the `couchdb` section)
to `false`. You can do this by clicking on the word `true`, and replacing it with false, and
hitting enter.

The next time you write a document to each database, it will commit the header to disk, and
your data will be secure. For safety, please continue with the next set of instructions.

==== For everyone ====

To ensure that each database is committed, you can use the `_ensure_full_commit` command.
There are a few of ways to do this.

The simplest method is to right click the following link and add it to your bookmarks.

Bookmarklet: [[javascript:%%24.couch.allDbs%%28%%7Bsuccess%%3Afunction%%28dbs%%29%%7Bfunction%%20commitDbs%%28list%%29%%7Bvar%%20db%%3Dlist.pop%%28%%29%%3B%%24.ajax%%28%%7Btype%%3A%%22POST%%22%%2Curl%%3A%%22%%2F%%22%%2BencodeURIComponent%%28db%%29%%2B%%22%%2F_ensure_full_commit%%22%%2CcontentType%%3A%%22application%%2Fjson%%22%%2CdataType%%3A%%22json%%22%%2Ccomplete%%3Afunction%%28r%%29%%7B%%24%%28%%22%%23content%%22%%29.prepend%%28%%27%%3Cul%%20id%%3D%%22commit_all%%22%%3E%%3C%%2Ful%%3E%%27%%29%%3Bif%%28r.status%%3D%%3D201%%29%%7B%%24%%28%%22%%23commit_all%%22%%29.append%%28%%27%%3Cli%%3Ecommitted%%3A%%20%%27%%2Bdb%%2B%%27%%3C%%2Fli%%3E%%27%%29%%3B%%7Delse%%7B%%24%%28%%22%%23commit_all%%22%%29.append%%28%%27%%3Cli%%20style%%3D%%22color%%3Ared%%3B%%22%%3Eerror%%3A%%20%%27%%2Bdb%%2B%%27%%3C%%2Fli%%3E%%27%%29%%3B%%7Dif%%28list.length%%3E0%%29%%7BcommitDbs%%28list%%29%%3B%%7D%%7D%%7D%%29%%3B%%7DcommitDbs%%28dbs%%29%%3B%%7D%%7D%%29%%3B|Commit
All Databases]]

Now visit Futon on your CouchDB instance at http://localhost:5984/_utils/, and select the
bookmark. It will use the !JavaScript libraries included with Futon to ensure all your databases
are fully committed.

Alternatively, here is a simple HTML file that you can upload to your CouchDB using Futon.
When you visit it, it will make sure your data is all safely committed. If you prefer a shell
script, skip below this file.

Save this HTML to a file on your machine called `commit_all.html`

{{{
    <!DOCTYPE html>
    <html>
      <head><title>Commit All Databases</title></head>
      <body>
        <h1>Commit All Databases</h1>
        <p>This script will trigger <tt>_ensure_full_commit</tt> on all
databases.</p>
        <ul id="databases"></ul>
      </body>
      <script src="/_utils/script/jquery.js"></script>
      <script src="/_utils/script/jquery.couch.js"></script>
      <script>
        $.couch.allDbs({
          success : function(dbs) {
            dbs.forEach(function(db) {
              $.ajax({
                type: "POST", url: "/" + encodeURIComponent(db) + "/_ensure_full_commit",
                contentType: "application/json", dataType: "json",
                complete : function(r) {
                  if (r.status == 201) {
                    $("#databases").append('<li>committed: '+db+'</li>');
                  } else {
                    $("#databases").append('<li style="color:red;">error: '+db+'</li>');
                  }
                }
              });
            });
          }
        });
      </script>
    </html>
}}}

Now browse to your CouchDB's Futon at http://localhost:5984/_utils/ and create a database.
Now visit that database, and create a document, and save it. Now click the button labeled
"Upload Attachment" and choose the `commit_all.html` file you just created, and upload it.
A link to that HTML file will appear in Futon.

Now click the link in Futon for `commit_all.html`, and it will run `_ensure_full_commit` on
all of your databases.

If you prefer a shell script, [[http://wiki.couchone.com/page/ensure-full_commit-sh|this will
also commit all your databases]].

At this point your data is safe.

==== If you don't have admin credentials ====

**Warning:** make sure you followed the instructions in the above section "For everyone" before
you do the rest of these steps. If you were able to log into CouchDB as an administrator (and
complete the first section, before "For Everyone") than you can skip this section.

In this step we will configure your CouchDB so that future updates will be durable.

Did you run the above HTML script? Do that now, or the next action may destroy data.

Now, find CouchDB's configuration file. It will be called `local.ini` and it is probably in
a locations like: `/usr/local/etc/couchdb/local.ini`

Open the file, and add the following lines to it:

{{{
    [couchdb]
    delayed_commits = false
}}}

Now, restart your CouchDB. This will be different on different operating systems. If you have
your CouchDB configured as a system service, restarting the computer will do the trick, but
if you don't want to do that, you can probably find the pid of CouchDB, by running `ps ax
| grep couchdb`. Once you have the pid, you can kill CouchDB by running `kill <pid>`.
If you are a fan of magic, you can do all that in one ninja move by running:

{{{
      kill `ps ax | grep couchdb | head -n1 | awk '{print $1}'`
}}}

Note: you might need to sudo.

Once CouchDB is killed, the system should bring it back up. When it boots, it will load the
config for `delayed_commits = false` so updates from that point forward will be durable.

=== The Bug ===

Now that we have you fixed up, you might enjoy a look at the technicalities of what got broken
in CouchDB.

A commit is what causes writes to become durably flushed to storage. It is an expensive operation.
During a commit, recent writes are flushed to disk and a new database header is written. Finally,
the new header is also flushed to disk. At the operating system level this involves multiple
fsync() calls to ensure data has been fully written.

Delayed commits are a feature of CouchDB that allows it to achieve better write performance
for some workloads while sacrificing a small amount of durability. The setting causes CouchDB
to wait up to a full second before committing new data after an update. If the server crashes
before the header is written then any writes since the last commit are lost. The choice of
delayed commits as a default has been discussed many times and the consensus was that they
should remain on for the 1.0 release.

For each open database in CouchDB there is an Erlang process referred to as the update process,
the source for which is in a file called `couch_db_updater.erl`. All writes to a given database
pass through the corresponding update process. This process is in charge of preparing, writing
and committing batches of updates. In order to provide delayed commits, the update process
sets a timer for one second in the future. When the timer expires a commit message is sent
back to the updater. A reference to this timer is kept in the updater state. This reference
prevents the updater from scheduling excessive commit messages when one is already pending.

In the updater code that shipped with 1.0 a delayed commit message that arrived when there
were no pending writes never cleared the timer reference. As a result, the updater state erroneously
indicated that there was a future commit scheduled. Once in this bad state the updater would
never schedule another commit. In practice, this problem occurred when a write conflict was
followed by a period of inactivity. The conflicting write triggered the delayed commit, but
when the commit message arrived no new data needed to be written and the timer reference was
not cleared. This scenario is thankfully unlikely to occur in a busy database.

=== Mixups and Fixes ===

One can never say exactly what lead to a particular bug.  In this case, there were some contributing
factors.

==== Release procedure ====

In the run-up to 1.0, there was some confusion about which branch would ultimately become
1.0. Originally we'd discussed branching 1.0 from the 0.11.x line, as 0.11 was a feature freeze
release, so that we could concentrate on bugs and performance for 1.0. However, as we approached
1.0's release, there was very little work in trunk that involved new features. And the few
features added to trunk were really just refinements of existing functionality, to make it
more user friendly, etc.

So in the final weeks before 1.0's release, we decided to cut it from trunk (as opposed to
from the 0.11.x branch) as that would make for more straightforward code management in the
future. It has also been our release policy since the early days of the project.

As a result the commit that introduced the bug went into trunk when 0.11.x was still designated
to become the 1.0 release with the intention to have it prove its stability before a future
1.1 release. After we decided to cut 1.0 from trunk, this commit didn't get the necessary
review to stay in the 1.0 release branch.

The fix here is that we are now crystal clear that future releases will always be cut from
trunk. So if people are committing stuff that they feel is not baked enough for trunk, those
commits will be more likely done in a feature branch. Keeping clear about this is one way
we can avoid similar issues in the future.

==== Code review ====

In the run up to 1.0, there were mailing list messages about which commits were trivial, and
which needed review. In the case of the commits that weren't trivial, the original committer
was the one who said he thought they were fine. In the future, for any commits to the deepest
parts of the storage engine, we will be careful to have review from multiple parties. Many
eyes make bugs shallow, but for code like the core CouchDB storage engine, there aren't a
lot of folks who are ready to review and understand a particular patch.

==== Testing ====

CouchDB currently has a suite of unit and integration tests, which guide development and provide
the first line of documentation. We also have a few independent benchmark suites, which we
can use to track performance improvements and regressions.

What we don't have is a set of correctness stress tests. In this case, a fuzzing test, that
applies a random set of operations to a constrained keyspace, while tracking the expected
database state, and then restarting the server to make sure the state is as expected, would
have caught the error.

We could learn a lot from the [[http://www.sqlite.org/testing.html|SQLite testing methodology]].
Expect to see more stress and correctness tests in CouchDB's future.

Mime
View raw message