From commits-return-8068-apmail-couchdb-commits-archive=couchdb.apache.org@couchdb.apache.org Sat Feb 25 20:09:52 2012 Return-Path: X-Original-To: apmail-couchdb-commits-archive@www.apache.org Delivered-To: apmail-couchdb-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C75F39F6A for ; Sat, 25 Feb 2012 20:09:52 +0000 (UTC) Received: (qmail 22980 invoked by uid 500); 25 Feb 2012 20:09:52 -0000 Delivered-To: apmail-couchdb-commits-archive@couchdb.apache.org Received: (qmail 22951 invoked by uid 500); 25 Feb 2012 20:09:52 -0000 Mailing-List: contact commits-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list commits@couchdb.apache.org Received: (qmail 22944 invoked by uid 500); 25 Feb 2012 20:09:52 -0000 Delivered-To: apmail-incubator-couchdb-commits@incubator.apache.org Received: (qmail 22941 invoked by uid 99); 25 Feb 2012 20:09:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2012 20:09:52 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2012 20:09:50 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id AC457F18; Sat, 25 Feb 2012 20:09:30 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Sat, 25 Feb 2012 20:09:30 -0000 Message-ID: <20120225200930.11353.30144@eos.apache.org> Subject: =?utf-8?q?=5BCouchdb_Wiki=5D_Update_of_=22ReleaseNotices=22_by_JanLehnard?= =?utf-8?q?t?= Auto-Submitted: auto-generated Dear Wiki user, You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for c= hange notification. The "ReleaseNotices" page has been changed by JanLehnardt: http://wiki.apache.org/couchdb/ReleaseNotices Comment: new release notices page New page: <> =3D Release Notices =3D Sometimes, after we make a release, we might find out that something is wro= ng with it that is so severe that we need to tell everyone who runs that re= lease. This page collects these notices. <> =3D=3D 1.0.0 =3D=3D **A 1.0.0 RECOVERY TOOL IS NOW AVAILABLE** Download the [[http://wiki.couchone.com/page/repair-tool#/|CouchDB 1.0.0 Re= pair Tool]] to recover data. =3D=3D=3D Notes on a Nasty Bug =3D=3D=3D Developers should be using 1.0.1 release only at this point; not the 1.0.0 = version. Read on to find out why. On the weekend of August 7th=E2=80=938th, 2010 we discovered and fixed a bu= g in CouchDB 1.0.0. The problem was subtle (cancelling a timer, without del= eting the reference to it) but the ramifications were not: there was potent= ial data loss for users of 1.0.0. The 1.0.1 release contains a permanent fi= x, and [is available now on the download page](../downloads.html). We are proud how quickly the CouchDB community recovered from this bug and = went the extra mile to make sure everyone's data was safe. It is clear we h= ave a group of developers who care enough about all users' data that it agg= ressively pursued an "edge case" bug so no one would be caught off guard. F= urther, the team worked for the next week to create a repair tool to recove= r access to data which was affected by the bug. As a result, no users lost = data permanently. Kudos! =3D=3D=3D The Remedy =3D=3D=3D For current users, these instructions will ensure your data is safe. First:= **do not restart your CouchDB!** The hot fix involves changing configurati= on on the running server, so have your admin credentials handy (if your Co= uchDB is in Admin Party mode with no admins defined, you won't need admin c= redentials). (If you do not have admin credentials, but you can restart the= server, you can still prevent data loss. Read on.) =3D=3D=3D=3D If you have admin credentials (or if your CouchDB is in Admin = Party mode) =3D=3D=3D=3D Visit the Futon admin console at http://yourserver:5984/_utils/, and click = "Login" in the lower right hand corner. Login as an administrator, and visi= t the "Configuration" page linked in the sidebar: http://yourserver:5984/_u= tils/config.html Now that you are in the configuration page, set `delayed_commits` (in the `= couchdb` section) to `false`. You can do this by clicking on the word `true= `, and replacing it with false, and hitting enter. The next time you write a document to each database, it will commit the hea= der to disk, and your data will be secure. For safety, please continue with= the next set of instructions. =3D=3D=3D=3D For everyone =3D=3D=3D=3D To ensure that each database is committed, you can use the `_ensure_full_co= mmit` command. There are a few of ways to do this. The simplest method is to right click the following link and add it to your= bookmarks. Bookmarklet: [[javascript:%%24.couch.allDbs%%28%%7Bsuccess%%3Afunction%%28d= bs%%29%%7Bfunction%%20commitDbs%%28list%%29%%7Bvar%%20db%%3Dlist.pop%%28%%2= 9%%3B%%24.ajax%%28%%7Btype%%3A%%22POST%%22%%2Curl%%3A%%22%%2F%%22%%2Bencode= URIComponent%%28db%%29%%2B%%22%%2F_ensure_full_commit%%22%%2CcontentType%%3= A%%22application%%2Fjson%%22%%2CdataType%%3A%%22json%%22%%2Ccomplete%%3Afun= ction%%28r%%29%%7B%%24%%28%%22%%23content%%22%%29.prepend%%28%%27%%3Cul%%20= id%%3D%%22commit_all%%22%%3E%%3C%%2Ful%%3E%%27%%29%%3Bif%%28r.status%%3D%%3= D201%%29%%7B%%24%%28%%22%%23commit_all%%22%%29.append%%28%%27%%3Cli%%3Ecomm= itted%%3A%%20%%27%%2Bdb%%2B%%27%%3C%%2Fli%%3E%%27%%29%%3B%%7Delse%%7B%%24%%= 28%%22%%23commit_all%%22%%29.append%%28%%27%%3Cli%%20style%%3D%%22color%%3A= red%%3B%%22%%3Eerror%%3A%%20%%27%%2Bdb%%2B%%27%%3C%%2Fli%%3E%%27%%29%%3B%%7= Dif%%28list.length%%3E0%%29%%7BcommitDbs%%28list%%29%%3B%%7D%%7D%%7D%%29%%3= B%%7DcommitDbs%%28dbs%%29%%3B%%7D%%7D%%29%%3B|Commit All Databases]] Now visit Futon on your CouchDB instance at http://localhost:5984/_utils/, = and select the bookmark. It will use the !JavaScript libraries included wit= h Futon to ensure all your databases are fully committed. Alternatively, here is a simple HTML file that you can upload to your Couch= DB using Futon. When you visit it, it will make sure your data is all safel= y committed. If you prefer a shell script, skip below this file. Save this HTML to a file on your machine called `commit_all.html` {{{ Commit All Databases

Commit All Databases

This script will trigger _ensure_full_commit on all dat= abases.

    }}} Now browse to your CouchDB's Futon at http://localhost:5984/_utils/ and cre= ate a database. Now visit that database, and create a document, and save it= . Now click the button labeled "Upload Attachment" and choose the `commit_a= ll.html` file you just created, and upload it. A link to that HTML file wil= l appear in Futon. Now click the link in Futon for `commit_all.html`, and it will run `_ensure= _full_commit` on all of your databases. If you prefer a shell script, [[http://wiki.couchone.com/page/ensure-full_c= ommit-sh|this will also commit all your databases]]. At this point your data is safe. =3D=3D=3D=3D If you don't have admin credentials =3D=3D=3D=3D **Warning:** make sure you followed the instructions in the above section "= For everyone" before you do the rest of these steps. If you were able to lo= g into CouchDB as an administrator (and complete the first section, before = "For Everyone") than you can skip this section. In this step we will configure your CouchDB so that future updates will be = durable. Did you run the above HTML script? Do that now, or the next action may dest= roy data. Now, find CouchDB's configuration file. It will be called `local.ini` and i= t is probably in a locations like: `/usr/local/etc/couchdb/local.ini` Open the file, and add the following lines to it: {{{ [couchdb] delayed_commits =3D false }}} Now, restart your CouchDB. This will be different on different operating sy= stems. If you have your CouchDB configured as a system service, restarting = the computer will do the trick, but if you don't want to do that, you can p= robably find the pid of CouchDB, by running `ps ax | grep couchdb`. Once yo= u have the pid, you can kill CouchDB by running `kill `. If you are a = fan of magic, you can do all that in one ninja move by running: {{{ kill `ps ax | grep couchdb | head -n1 | awk '{print $1}'` }}} Note: you might need to sudo. Once CouchDB is killed, the system should bring it back up. When it boots, = it will load the config for `delayed_commits =3D false` so updates from tha= t point forward will be durable. =3D=3D=3D The Bug =3D=3D=3D Now that we have you fixed up, you might enjoy a look at the technicalities= of what got broken in CouchDB. A commit is what causes writes to become durably flushed to storage. It is = an expensive operation. During a commit, recent writes are flushed to disk = and a new database header is written. Finally, the new header is also flush= ed to disk. At the operating system level this involves multiple fsync() ca= lls to ensure data has been fully written. Delayed commits are a feature of CouchDB that allows it to achieve better w= rite performance for some workloads while sacrificing a small amount of dur= ability. The setting causes CouchDB to wait up to a full second before comm= itting new data after an update. If the server crashes before the header is= written then any writes since the last commit are lost. The choice of dela= yed commits as a default has been discussed many times and the consensus wa= s that they should remain on for the 1.0 release. For each open database in CouchDB there is an Erlang process referred to as= the update process, the source for which is in a file called `couch_db_upd= ater.erl`. All writes to a given database pass through the corresponding up= date process. This process is in charge of preparing, writing and committin= g batches of updates. In order to provide delayed commits, the update proce= ss sets a timer for one second in the future. When the timer expires a comm= it message is sent back to the updater. A reference to this timer is kept i= n the updater state. This reference prevents the updater from scheduling ex= cessive commit messages when one is already pending. In the updater code that shipped with 1.0 a delayed commit message that arr= ived when there were no pending writes never cleared the timer reference. A= s a result, the updater state erroneously indicated that there was a future= commit scheduled. Once in this bad state the updater would never schedule = another commit. In practice, this problem occurred when a write conflict wa= s followed by a period of inactivity. The conflicting write triggered the d= elayed commit, but when the commit message arrived no new data needed to be= written and the timer reference was not cleared. This scenario is thankful= ly unlikely to occur in a busy database. =3D=3D=3D Mixups and Fixes =3D=3D=3D One can never say exactly what lead to a particular bug. In this case, the= re were some contributing factors. =3D=3D=3D=3D Release procedure =3D=3D=3D=3D In the run-up to 1.0, there was some confusion about which branch would ult= imately become 1.0. Originally we'd discussed branching 1.0 from the 0.11.x= line, as 0.11 was a feature freeze release, so that we could concentrate o= n bugs and performance for 1.0. However, as we approached 1.0's release, th= ere was very little work in trunk that involved new features. And the few f= eatures added to trunk were really just refinements of existing functionali= ty, to make it more user friendly, etc. So in the final weeks before 1.0's release, we decided to cut it from trunk= (as opposed to from the 0.11.x branch) as that would make for more straigh= tforward code management in the future. It has also been our release policy= since the early days of the project. As a result the commit that introduced the bug went into trunk when 0.11.x = was still designated to become the 1.0 release with the intention to have i= t prove its stability before a future 1.1 release. After we decided to cut = 1.0 from trunk, this commit didn't get the necessary review to stay in the = 1.0 release branch. The fix here is that we are now crystal clear that future releases will alw= ays be cut from trunk. So if people are committing stuff that they feel is = not baked enough for trunk, those commits will be more likely done in a fea= ture branch. Keeping clear about this is one way we can avoid similar issue= s in the future. =3D=3D=3D=3D Code review =3D=3D=3D=3D In the run up to 1.0, there were mailing list messages about which commits = were trivial, and which needed review. In the case of the commits that were= n't trivial, the original committer was the one who said he thought they we= re fine. In the future, for any commits to the deepest parts of the storage= engine, we will be careful to have review from multiple parties. Many eyes= make bugs shallow, but for code like the core CouchDB storage engine, ther= e aren't a lot of folks who are ready to review and understand a particular= patch. =3D=3D=3D=3D Testing =3D=3D=3D=3D CouchDB currently has a suite of unit and integration tests, which guide de= velopment and provide the first line of documentation. We also have a few i= ndependent benchmark suites, which we can use to track performance improvem= ents and regressions. What we don't have is a set of correctness stress tests. In this case, a fu= zzing test, that applies a random set of operations to a constrained keyspa= ce, while tracking the expected database state, and then restarting the ser= ver to make sure the state is as expected, would have caught the error. We could learn a lot from the [[http://www.sqlite.org/testing.html|SQLite t= esting methodology]]. Expect to see more stress and correctness tests in Co= uchDB's future.