From dev-return-11195-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Tue Aug 10 08:57:40 2010 Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 81912 invoked from network); 10 Aug 2010 08:57:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Aug 2010 08:57:40 -0000 Received: (qmail 12500 invoked by uid 500); 10 Aug 2010 08:57:40 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 12356 invoked by uid 500); 10 Aug 2010 08:57:37 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 12338 invoked by uid 99); 10 Aug 2010 08:57:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 08:57:37 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of robert.newson@gmail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-wy0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 08:57:31 +0000 Received: by wya21 with SMTP id 21so14167347wya.11 for ; Tue, 10 Aug 2010 01:57:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=RVR3OwoevMhQXWPZHUK5Nka7pwplA5QHUPsdIEPsnV4=; b=REJAPSc4Zb/uJ13nGe8iRNXBBMONa1NivcLQkje7hyxHTq9wJGHtKjBYs5s+ron5gC alLSFs3bR0TWrxOmZiWEhW+RgxGlVjQGt8gF/zRcRdpalMXwQpIJp0EJqturZQmL5mKL lBB7PwYSjieFTQi4umP2fI3lp2QZT6WonDJQs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=VqfYcD+HmbM8adGzbqCFZwUh9NM0+clM3W8dVC5JlHzhtzW3AaPmoWp/FvJhGSDaTT +BYWO/wKfRTHB7EJAzY/wWl4/duPcvI2wX/YHl2g1Koaq3eCROpmYyAUaUurgea3AMIn qsucZNw3zZSMhYZut+HywLiVT4meNaqqeJkD4= MIME-Version: 1.0 Received: by 10.216.185.72 with SMTP id t50mr14854836wem.77.1281430630452; Tue, 10 Aug 2010 01:57:10 -0700 (PDT) Received: by 10.216.230.92 with HTTP; Tue, 10 Aug 2010 01:57:10 -0700 (PDT) In-Reply-To: References: <8385F758-360B-425A-ACBD-03C898BFDA21@apache.org> <1690416A-4C01-4756-9D3B-A256DC729813@apache.org> <154AD543-C787-441C-851B-D59CEA6765CC@apache.org> <5F47BBB4-9F58-4EFE-92C8-B0FEDA5B01B7@apache.org> Date: Tue, 10 Aug 2010 09:57:10 +0100 Message-ID: Subject: Re: data recovery tool progress From: Robert Newson To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org slight correction, this was with delayed_commits=3Dfalse. My framework does a PUT to ensure that on every test run. B. On Tue, Aug 10, 2010 at 9:55 AM, Robert Newson wr= ote: > In ran the db_repair code on a healthy database produced with > delayed_commits=3Dtrue. > > The source db had 3218 docs. db_repair recovered 3120 and then returned w= ith ok. > > I'm redoing that test, but this indicates we're not finding all roots. > > I note that the output file was 36 times the input file, which is a > consequence of folding all possible roots. I think that needs to be in > the release notes for the repair tool if that behavior remains when it > ships. > > B. > > On Tue, Aug 10, 2010 at 9:09 AM, Mikeal Rogers = wrote: >> I think I found a bug in the current lost+found repair. >> >> I've been running it against the testwritesdb and it's in a state that i= s >> never finishing. >> >> It's still spitting out these lines: >> >> [info] [<0.32.0>] writing 1001 updates to lost+found/testwritesdb >> >> Most are 1001 but there are also other random variances 452, 866, etc. >> >> But the file size and dbinfo hasn't budged in over 30 minutes. The size = is >> stuck at 34300002 with the original db file being 54857478 . >> >> This database only has one document in it that isn't "lost" so if it's >> finding *any* new docs it should be writing them. >> >> I also started another job to recover a production db that is quite larg= e, >> 500megs, with the missing data a week or so back. This has been running = for >> 2 hours and has still not output anything or created the lost and found = db >> so I can only assume that it is in the same state. >> >> Both machines are still churning 100% CPU. >> >> -Mikeal >> >> >> On Mon, Aug 9, 2010 at 11:26 PM, Adam Kocoloski wr= ote: >> >>> With Randall's help we hooked the new node scanner up to the lost+found= DB >>> generator. =A0It seems to work well enough for small DBs; for large DBs= with >>> lots of missing nodes the O(N^2) complexity of the problem catches up t= o the >>> code and generating the lost+found DB takes quite some time. =A0Mikeal = is >>> running tests tonight. =A0The algo appears pretty CPU-limited, so a lit= tle >>> parallelization may be warranted. >>> >>> http://github.com/kocolosk/couchdb/tree/db_repair >>> >>> Adam >>> >>> (I sent this previous update to myself instead of the list, so I'll for= ward >>> it here ...) >>> >>> On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote: >>> >>> > On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote: >>> > >>> >> Right, make_lost_and_found still relies on code which reads through >>> couch_file one byte at a time, that's the cause of the slowness. =A0The= newer >>> scanner will improve that pretty dramatically, and we can tune it furth= er by >>> increasing the length of the pattern that we match when looking for >>> kp/kv_node terms in the files, at the expense of some extra complexity >>> dealing with the block prefixes (currently it does a 1-byte match, whic= h as >>> I understand it cannot be split across blocks). >>> > >>> > The scanner now looks for a 7 byte match, unless it is within 6 bytes= of >>> a block boundary, in which case it looks for the longest possible match= at >>> that position. =A0The more specific match condition greatly reduces the= # of >>> calls to couch_file, and thus boosts the throughput. =A0On my laptop it= can >>> scan the testwritesdb.couch from Mikeal's couchtest repo (52 MB) in 18 >>> seconds. >>> > >>> >> Regarding the file_corruption error on the larger file, I think this= is >>> something we will just naturally trigger when we take a guess that rand= om >>> positions in a file are actually the beginning of a term. =A0I think ou= r best >>> recourse here is to return {error, file_corruption} from couch_file but >>> leave the gen_server up and running instead of terminating it. =A0That = way the >>> repair code can ignore the error and keep moving without having to reop= en >>> the file. >>> > >>> > I committed this change (to my db_repair branch) after consulting wit= h >>> Chris. =A0The longer match condition makes these spurious file_corrupti= on >>> triggers much less likely, but I think it's still a good thing not to c= rash >>> the server when they happen. >>> > >>> >> Next steps as I understand them - Randall is working on integrating = the >>> in-memory scanner into Volker's code that finds all the dangling by_id >>> nodes. =A0I'm working on making sure that the scanner identifies bt nod= e >>> candidates which span block prefixes, and on improving its pattern-matc= hing. >>> > >>> > Latest from my end >>> > http://github.com/kocolosk/couchdb/tree/db_repair >>> > >>> >> >>> >> Adam >>> >> >>> >> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote: >>> >> >>> >>> I pulled down the latest code from Adam's branch @ >>> >>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b. >>> >>> >>> >>> Running timer:tc(couch_db_repair, make_lost_and_found, >>> ["multi_conflict"]). >>> >>> on a database with 200 lost updates spanning 200 restarts ( >>> >>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch= ) >>> took >>> >>> about 101 seconds. >>> >>> >>> >>> I tried running against a larger databases ( >>> >>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch ) >>> and I >>> >>> got this exception: >>> >>> >>> >>> http://gist.github.com/516491 >>> >>> >>> >>> -Mikeal >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds >> >wrote: >>> >>> >>> >>>> Summing up what went on in IRC for those who were absent. >>> >>>> >>> >>>> The latest progress is on Adam's branch at >>> >>>> http://github.com/kocolosk/couchdb/tree/db_repair >>> >>>> >>> >>>> couch_db_repair:make_lost_and_found/1 attempts to create a new >>> >>>> lost+found/DbName database to which it merges all nodes not access= ible >>> >>>> from anywhere (any other node found in a full file scan or any hea= der >>> >>>> pointers). >>> >>>> >>> >>>> Currently, make_lost_and_found uses Volker's repair (from >>> >>>> couch_db_repair_b module, also in Adam's branch). >>> >>>> Adam found that the bottleneck was couch_file calls and that the >>> >>>> repair process was taking a very long time so he added >>> >>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as bina= ry >>> >>>> and tries to process it to find nodes instead of scanning back one >>> >>>> byte at a time. It is currently not hooked up to the repair mechan= ism. >>> >>>> >>> >>>> Making progress. Go team. >>> >>>> >>> >>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers >>> >>>> wrote: >>> >>>>> jchris suggested on IRC that I try a normal doc update and see if >>> that >>> >>>> fixes >>> >>>>> it. >>> >>>>> >>> >>>>> It does. After a new doc was created the dbinfo doc count was bac= k to >>> >>>>> normal. >>> >>>>> >>> >>>>> -Mikeal >>> >>>>> >>> >>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers < >>> mikeal.rogers@gmail.com >>> >>>>> wrote: >>> >>>>> >>> >>>>>> Ok, I pulled down this code and tested against a database with a= ton >>> of >>> >>>>>> missing writes right before a single restart. >>> >>>>>> >>> >>>>>> Before restart this was the database: >>> >>>>>> >>> >>>>>> { >>> >>>>>> db_name: "testwritesdb" >>> >>>>>> doc_count: 124969 >>> >>>>>> doc_del_count: 0 >>> >>>>>> update_seq: 124969 >>> >>>>>> purge_seq: 0 >>> >>>>>> compact_running: false >>> >>>>>> disk_size: 54857478 >>> >>>>>> instance_start_time: "1281384140058211" >>> >>>>>> disk_format_version: 5 >>> >>>>>> } >>> >>>>>> >>> >>>>>> After restart it was this: >>> >>>>>> >>> >>>>>> { >>> >>>>>> db_name: "testwritesdb" >>> >>>>>> doc_count: 1 >>> >>>>>> doc_del_count: 0 >>> >>>>>> update_seq: 1 >>> >>>>>> purge_seq: 0 >>> >>>>>> compact_running: false >>> >>>>>> disk_size: 54857478 >>> >>>>>> instance_start_time: "1281384593876026" >>> >>>>>> disk_format_version: 5 >>> >>>>>> } >>> >>>>>> >>> >>>>>> After repair, it's this: >>> >>>>>> >>> >>>>>> { >>> >>>>>> db_name: "testwritesdb" >>> >>>>>> doc_count: 1 >>> >>>>>> doc_del_count: 0 >>> >>>>>> update_seq: 124969 >>> >>>>>> purge_seq: 0 >>> >>>>>> compact_running: false >>> >>>>>> disk_size: 54857820 >>> >>>>>> instance_start_time: "1281385990193289" >>> >>>>>> disk_format_version: 5 >>> >>>>>> committed_update_seq: 124969 >>> >>>>>> } >>> >>>>>> >>> >>>>>> All the sequences are there and hitting _all_docs shows all the >>> >>>> documents >>> >>>>>> so why is the doc_count only 1 in the dbinfo? >>> >>>>>> >>> >>>>>> -Mikeal >>> >>>>>> >>> >>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana < >>> >>>> fdmanana@apache.org>wrote: >>> >>>>>> >>> >>>>>>> For the record (and people not on IRC), the code at: >>> >>>>>>> >>> >>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair >>> >>>>>>> >>> >>>>>>> is working for at least simple cases. Use >>> >>>>>>> couch_db_repair:repair(DbNameAsString). >>> >>>>>>> There's one TODO: =A0update the reduce values for the by_seq an= d >>> by_id >>> >>>>>>> BTrees. >>> >>>>>>> >>> >>>>>>> If anyone wants to give some help on this, your welcome. >>> >>>>>>> >>> >>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers < >>> mikeal.rogers@gmail.com >>> >>>>>>>> wrote: >>> >>>>>>> >>> >>>>>>>> I'm starting to create a bunch of test db files that expose th= is >>> bug >>> >>>>>>> under >>> >>>>>>>> different conditions like multiple restarts, across compaction= , >>> >>>>>>> variances >>> >>>>>>>> in >>> >>>>>>>> updates the might cause conflict, etc. >>> >>>>>>>> >>> >>>>>>>> http://github.com/mikeal/couchtest >>> >>>>>>>> >>> >>>>>>>> The README outlines what was done to the db's and what needs t= o be >>> >>>>>>>> recovered. >>> >>>>>>>> >>> >>>>>>>> -Mikeal >>> >>>>>>>> >>> >>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana < >>> >>>>>>> fdmanana@apache.org >>> >>>>>>>>> wrote: >>> >>>>>>>> >>> >>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson < >>> >>>>>>> robert.newson@gmail.com >>> >>>>>>>>>> wrote: >>> >>>>>>>>> >>> >>>>>>>>>> Doesn't this bit; >>> >>>>>>>>>> >>> >>>>>>>>>> - =A0 =A0 =A0 =A0Db#db{waiting_delayed_commit=3Dnil}; >>> >>>>>>>>>> + =A0 =A0 =A0 =A0Db; >>> >>>>>>>>>> + =A0 =A0 =A0 =A0% Db#db{waiting_delayed_commit=3Dnil}; >>> >>>>>>>>>> >>> >>>>>>>>>> revert the bug fix? >>> >>>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> That's intentional, for my local testing. >>> >>>>>>>>> That patch isn't obviously anything close to final, it's too >>> >>>>>>> experimental >>> >>>>>>>>> yet. >>> >>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> B. >>> >>>>>>>>>> >>> >>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt >>> >>>>>>> wrote: >>> >>>>>>>>>>> Hi All, >>> >>>>>>>>>>> >>> >>>>>>>>>>> Filipe jumped in to start working on the recovery tool, but= he >>> >>>>>>> isn't >>> >>>>>>>>> done >>> >>>>>>>>>> yet. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Here's the current patch: >>> >>>>>>>>>>> >>> >>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz >>> >>>>>>>>>>> >>> >>>>>>>>>>> it is not done and very early, but any help on this is grea= tly >>> >>>>>>>>>> appreciated. >>> >>>>>>>>>>> >>> >>>>>>>>>>> The current state is (in Filipe's words): >>> >>>>>>>>>>> - i can detect that a file needs repair >>> >>>>>>>>>>> - and get the last btree roots from it >>> >>>>>>>>>>> - "only" missing: get last db seq num >>> >>>>>>>>>>> - write new header >>> >>>>>>>>>>> - and deal with the local docs btree (if exists) >>> >>>>>>>>>>> >>> >>>>>>>>>>> Thanks! >>> >>>>>>>>>>> Jan >>> >>>>>>>>>>> -- >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> -- >>> >>>>>>>>> Filipe David Manana, >>> >>>>>>>>> fdmanana@apache.org >>> >>>>>>>>> >>> >>>>>>>>> "Reasonable men adapt themselves to the world. >>> >>>>>>>>> Unreasonable men adapt the world to themselves. >>> >>>>>>>>> That's why all progress depends on unreasonable men." >>> >>>>>>>>> >>> >>>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -- >>> >>>>>>> Filipe David Manana, >>> >>>>>>> fdmanana@apache.org >>> >>>>>>> >>> >>>>>>> "Reasonable men adapt themselves to the world. >>> >>>>>>> Unreasonable men adapt the world to themselves. >>> >>>>>>> That's why all progress depends on unreasonable men." >>> >>>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>> >>> >>>> >>> >> >>> > >>> >>> >> >