Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 94727 invoked from network); 10 Aug 2010 09:29:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Aug 2010 09:29:11 -0000 Received: (qmail 40898 invoked by uid 500); 10 Aug 2010 09:29:11 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 40297 invoked by uid 500); 10 Aug 2010 09:29:07 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 40289 invoked by uid 99); 10 Aug 2010 09:29:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 09:29:06 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of fdmanana@gmail.com designates 209.85.161.52 as permitted sender) Received: from [209.85.161.52] (HELO mail-fx0-f52.google.com) (209.85.161.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 09:28:59 +0000 Received: by fxm10 with SMTP id 10so698056fxm.11 for ; Tue, 10 Aug 2010 02:28:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=33IMj8fnm/FWA/dtkVNlfHS+YciYF6Lw6ymtUrTPowI=; b=gtw40GeeAcdUT1vaWVdaFNPvNJMQaond44096Gt+GU476Bu098/Suufy9RVdcxlEDo wI841b580iaxX1WvkVKfQKUnl3bg/Zz/+ALO9FnTxNUOsD/MhoHzeb8FsXXkuB7sMMNE ypETtth3pAt4DkkiilBp25xElsNhKUUU+6/kY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=S+6Efr79JDCjqDIB5rEGU5AWD6KHg1s97ENwDfLQLCp+3OxPAYINV83d/NvsON2Zo/ RK3ei0oM5tXGKoIXtth1bILD4cRjxRkbfwAwa5+z4PdV+PU0D/QvjqrbqxPDfJxnIpeW oNDniW3I5flHiYoXqA47kAoItxwly7I976hBc= MIME-Version: 1.0 Received: by 10.239.160.12 with SMTP id a12mr887391hbd.81.1281432516711; Tue, 10 Aug 2010 02:28:36 -0700 (PDT) Sender: fdmanana@gmail.com Received: by 10.239.173.202 with HTTP; Tue, 10 Aug 2010 02:28:36 -0700 (PDT) In-Reply-To: References: <8385F758-360B-425A-ACBD-03C898BFDA21@apache.org> <1690416A-4C01-4756-9D3B-A256DC729813@apache.org> <154AD543-C787-441C-851B-D59CEA6765CC@apache.org> <5F47BBB4-9F58-4EFE-92C8-B0FEDA5B01B7@apache.org> Date: Tue, 10 Aug 2010 10:28:36 +0100 X-Google-Sender-Auth: jYcyCHeaN_ZYAUkJ4-YCB06NoxU Message-ID: Subject: Re: data recovery tool progress From: Filipe David Manana To: dev@couchdb.apache.org Content-Type: multipart/alternative; boundary=001485ed5748f65409048d74c14c X-Virus-Checked: Checked by ClamAV on apache.org --001485ed5748f65409048d74c14c Content-Type: text/plain; charset=UTF-8 On Tue, Aug 10, 2010 at 9:55 AM, Robert Newson wrote: > In ran the db_repair code on a healthy database produced with > delayed_commits=true. > > The source db had 3218 docs. db_repair recovered 3120 and then returned > with ok. > When a DB is repaired, couch_db_repair:repair/1 returns something matching {ok, repaired, _BTreeInfos}. If it returns only the atom 'ok' it means it did nothing to the DB file. At least in my original code, dunno if the forks changed that behaviour. > > I'm redoing that test, but this indicates we're not finding all roots. > > I note that the output file was 36 times the input file, which is a > consequence of folding all possible roots. I think that needs to be in > the release notes for the repair tool if that behavior remains when it > ships. > > B. > > On Tue, Aug 10, 2010 at 9:09 AM, Mikeal Rogers > wrote: > > I think I found a bug in the current lost+found repair. > > > > I've been running it against the testwritesdb and it's in a state that is > > never finishing. > > > > It's still spitting out these lines: > > > > [info] [<0.32.0>] writing 1001 updates to lost+found/testwritesdb > > > > Most are 1001 but there are also other random variances 452, 866, etc. > > > > But the file size and dbinfo hasn't budged in over 30 minutes. The size > is > > stuck at 34300002 with the original db file being 54857478 . > > > > This database only has one document in it that isn't "lost" so if it's > > finding *any* new docs it should be writing them. > > > > I also started another job to recover a production db that is quite > large, > > 500megs, with the missing data a week or so back. This has been running > for > > 2 hours and has still not output anything or created the lost and found > db > > so I can only assume that it is in the same state. > > > > Both machines are still churning 100% CPU. > > > > -Mikeal > > > > > > On Mon, Aug 9, 2010 at 11:26 PM, Adam Kocoloski > wrote: > > > >> With Randall's help we hooked the new node scanner up to the lost+found > DB > >> generator. It seems to work well enough for small DBs; for large DBs > with > >> lots of missing nodes the O(N^2) complexity of the problem catches up to > the > >> code and generating the lost+found DB takes quite some time. Mikeal is > >> running tests tonight. The algo appears pretty CPU-limited, so a little > >> parallelization may be warranted. > >> > >> http://github.com/kocolosk/couchdb/tree/db_repair > >> > >> Adam > >> > >> (I sent this previous update to myself instead of the list, so I'll > forward > >> it here ...) > >> > >> On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote: > >> > >> > On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote: > >> > > >> >> Right, make_lost_and_found still relies on code which reads through > >> couch_file one byte at a time, that's the cause of the slowness. The > newer > >> scanner will improve that pretty dramatically, and we can tune it > further by > >> increasing the length of the pattern that we match when looking for > >> kp/kv_node terms in the files, at the expense of some extra complexity > >> dealing with the block prefixes (currently it does a 1-byte match, which > as > >> I understand it cannot be split across blocks). > >> > > >> > The scanner now looks for a 7 byte match, unless it is within 6 bytes > of > >> a block boundary, in which case it looks for the longest possible match > at > >> that position. The more specific match condition greatly reduces the # > of > >> calls to couch_file, and thus boosts the throughput. On my laptop it > can > >> scan the testwritesdb.couch from Mikeal's couchtest repo (52 MB) in 18 > >> seconds. > >> > > >> >> Regarding the file_corruption error on the larger file, I think this > is > >> something we will just naturally trigger when we take a guess that > random > >> positions in a file are actually the beginning of a term. I think our > best > >> recourse here is to return {error, file_corruption} from couch_file but > >> leave the gen_server up and running instead of terminating it. That way > the > >> repair code can ignore the error and keep moving without having to > reopen > >> the file. > >> > > >> > I committed this change (to my db_repair branch) after consulting with > >> Chris. The longer match condition makes these spurious file_corruption > >> triggers much less likely, but I think it's still a good thing not to > crash > >> the server when they happen. > >> > > >> >> Next steps as I understand them - Randall is working on integrating > the > >> in-memory scanner into Volker's code that finds all the dangling by_id > >> nodes. I'm working on making sure that the scanner identifies bt node > >> candidates which span block prefixes, and on improving its > pattern-matching. > >> > > >> > Latest from my end > >> > http://github.com/kocolosk/couchdb/tree/db_repair > >> > > >> >> > >> >> Adam > >> >> > >> >> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote: > >> >> > >> >>> I pulled down the latest code from Adam's branch @ > >> >>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b. > >> >>> > >> >>> Running timer:tc(couch_db_repair, make_lost_and_found, > >> ["multi_conflict"]). > >> >>> on a database with 200 lost updates spanning 200 restarts ( > >> >>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch) > >> took > >> >>> about 101 seconds. > >> >>> > >> >>> I tried running against a larger databases ( > >> >>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch ) > >> and I > >> >>> got this exception: > >> >>> > >> >>> http://gist.github.com/516491 > >> >>> > >> >>> -Mikeal > >> >>> > >> >>> > >> >>> > >> >>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds < > randall.leeds@gmail.com > >> >wrote: > >> >>> > >> >>>> Summing up what went on in IRC for those who were absent. > >> >>>> > >> >>>> The latest progress is on Adam's branch at > >> >>>> http://github.com/kocolosk/couchdb/tree/db_repair > >> >>>> > >> >>>> couch_db_repair:make_lost_and_found/1 attempts to create a new > >> >>>> lost+found/DbName database to which it merges all nodes not > accessible > >> >>>> from anywhere (any other node found in a full file scan or any > header > >> >>>> pointers). > >> >>>> > >> >>>> Currently, make_lost_and_found uses Volker's repair (from > >> >>>> couch_db_repair_b module, also in Adam's branch). > >> >>>> Adam found that the bottleneck was couch_file calls and that the > >> >>>> repair process was taking a very long time so he added > >> >>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as > binary > >> >>>> and tries to process it to find nodes instead of scanning back one > >> >>>> byte at a time. It is currently not hooked up to the repair > mechanism. > >> >>>> > >> >>>> Making progress. Go team. > >> >>>> > >> >>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers < > mikeal.rogers@gmail.com> > >> >>>> wrote: > >> >>>>> jchris suggested on IRC that I try a normal doc update and see if > >> that > >> >>>> fixes > >> >>>>> it. > >> >>>>> > >> >>>>> It does. After a new doc was created the dbinfo doc count was back > to > >> >>>>> normal. > >> >>>>> > >> >>>>> -Mikeal > >> >>>>> > >> >>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers < > >> mikeal.rogers@gmail.com > >> >>>>> wrote: > >> >>>>> > >> >>>>>> Ok, I pulled down this code and tested against a database with a > ton > >> of > >> >>>>>> missing writes right before a single restart. > >> >>>>>> > >> >>>>>> Before restart this was the database: > >> >>>>>> > >> >>>>>> { > >> >>>>>> db_name: "testwritesdb" > >> >>>>>> doc_count: 124969 > >> >>>>>> doc_del_count: 0 > >> >>>>>> update_seq: 124969 > >> >>>>>> purge_seq: 0 > >> >>>>>> compact_running: false > >> >>>>>> disk_size: 54857478 > >> >>>>>> instance_start_time: "1281384140058211" > >> >>>>>> disk_format_version: 5 > >> >>>>>> } > >> >>>>>> > >> >>>>>> After restart it was this: > >> >>>>>> > >> >>>>>> { > >> >>>>>> db_name: "testwritesdb" > >> >>>>>> doc_count: 1 > >> >>>>>> doc_del_count: 0 > >> >>>>>> update_seq: 1 > >> >>>>>> purge_seq: 0 > >> >>>>>> compact_running: false > >> >>>>>> disk_size: 54857478 > >> >>>>>> instance_start_time: "1281384593876026" > >> >>>>>> disk_format_version: 5 > >> >>>>>> } > >> >>>>>> > >> >>>>>> After repair, it's this: > >> >>>>>> > >> >>>>>> { > >> >>>>>> db_name: "testwritesdb" > >> >>>>>> doc_count: 1 > >> >>>>>> doc_del_count: 0 > >> >>>>>> update_seq: 124969 > >> >>>>>> purge_seq: 0 > >> >>>>>> compact_running: false > >> >>>>>> disk_size: 54857820 > >> >>>>>> instance_start_time: "1281385990193289" > >> >>>>>> disk_format_version: 5 > >> >>>>>> committed_update_seq: 124969 > >> >>>>>> } > >> >>>>>> > >> >>>>>> All the sequences are there and hitting _all_docs shows all the > >> >>>> documents > >> >>>>>> so why is the doc_count only 1 in the dbinfo? > >> >>>>>> > >> >>>>>> -Mikeal > >> >>>>>> > >> >>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana < > >> >>>> fdmanana@apache.org>wrote: > >> >>>>>> > >> >>>>>>> For the record (and people not on IRC), the code at: > >> >>>>>>> > >> >>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair > >> >>>>>>> > >> >>>>>>> is working for at least simple cases. Use > >> >>>>>>> couch_db_repair:repair(DbNameAsString). > >> >>>>>>> There's one TODO: update the reduce values for the by_seq and > >> by_id > >> >>>>>>> BTrees. > >> >>>>>>> > >> >>>>>>> If anyone wants to give some help on this, your welcome. > >> >>>>>>> > >> >>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers < > >> mikeal.rogers@gmail.com > >> >>>>>>>> wrote: > >> >>>>>>> > >> >>>>>>>> I'm starting to create a bunch of test db files that expose > this > >> bug > >> >>>>>>> under > >> >>>>>>>> different conditions like multiple restarts, across compaction, > >> >>>>>>> variances > >> >>>>>>>> in > >> >>>>>>>> updates the might cause conflict, etc. > >> >>>>>>>> > >> >>>>>>>> http://github.com/mikeal/couchtest > >> >>>>>>>> > >> >>>>>>>> The README outlines what was done to the db's and what needs to > be > >> >>>>>>>> recovered. > >> >>>>>>>> > >> >>>>>>>> -Mikeal > >> >>>>>>>> > >> >>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana < > >> >>>>>>> fdmanana@apache.org > >> >>>>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson < > >> >>>>>>> robert.newson@gmail.com > >> >>>>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>>> Doesn't this bit; > >> >>>>>>>>>> > >> >>>>>>>>>> - Db#db{waiting_delayed_commit=nil}; > >> >>>>>>>>>> + Db; > >> >>>>>>>>>> + % Db#db{waiting_delayed_commit=nil}; > >> >>>>>>>>>> > >> >>>>>>>>>> revert the bug fix? > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> That's intentional, for my local testing. > >> >>>>>>>>> That patch isn't obviously anything close to final, it's too > >> >>>>>>> experimental > >> >>>>>>>>> yet. > >> >>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> B. > >> >>>>>>>>>> > >> >>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt > > >> >>>>>>> wrote: > >> >>>>>>>>>>> Hi All, > >> >>>>>>>>>>> > >> >>>>>>>>>>> Filipe jumped in to start working on the recovery tool, but > he > >> >>>>>>> isn't > >> >>>>>>>>> done > >> >>>>>>>>>> yet. > >> >>>>>>>>>>> > >> >>>>>>>>>>> Here's the current patch: > >> >>>>>>>>>>> > >> >>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz > >> >>>>>>>>>>> > >> >>>>>>>>>>> it is not done and very early, but any help on this is > greatly > >> >>>>>>>>>> appreciated. > >> >>>>>>>>>>> > >> >>>>>>>>>>> The current state is (in Filipe's words): > >> >>>>>>>>>>> - i can detect that a file needs repair > >> >>>>>>>>>>> - and get the last btree roots from it > >> >>>>>>>>>>> - "only" missing: get last db seq num > >> >>>>>>>>>>> - write new header > >> >>>>>>>>>>> - and deal with the local docs btree (if exists) > >> >>>>>>>>>>> > >> >>>>>>>>>>> Thanks! > >> >>>>>>>>>>> Jan > >> >>>>>>>>>>> -- > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -- > >> >>>>>>>>> Filipe David Manana, > >> >>>>>>>>> fdmanana@apache.org > >> >>>>>>>>> > >> >>>>>>>>> "Reasonable men adapt themselves to the world. > >> >>>>>>>>> Unreasonable men adapt the world to themselves. > >> >>>>>>>>> That's why all progress depends on unreasonable men." > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> -- > >> >>>>>>> Filipe David Manana, > >> >>>>>>> fdmanana@apache.org > >> >>>>>>> > >> >>>>>>> "Reasonable men adapt themselves to the world. > >> >>>>>>> Unreasonable men adapt the world to themselves. > >> >>>>>>> That's why all progress depends on unreasonable men." > >> >>>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >> > >> > > >> > >> > > > -- Filipe David Manana, fdmanana@apache.org "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." --001485ed5748f65409048d74c14c--