Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 15335 invoked from network); 10 Aug 2010 10:27:50 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Aug 2010 10:27:50 -0000 Received: (qmail 3699 invoked by uid 500); 10 Aug 2010 10:27:50 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 3302 invoked by uid 500); 10 Aug 2010 10:27:47 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 3294 invoked by uid 99); 10 Aug 2010 10:27:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 10:27:47 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [80.244.253.218] (HELO mail.traeumt.net) (80.244.253.218) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 10:27:41 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.traeumt.net (Postfix) with ESMTP id 8A1011B5A6 for ; Tue, 10 Aug 2010 12:27:18 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.g3th.net Received: from unknown by localhost (amavisd-new, unix socket) id bkt4bcG4JMNI for ; Tue, 10 Aug 2010 12:27:15 +0200 (CEST) Received: from dahlia.fritz.box (brln-4d0cdcf7.pool.mediaWays.net [77.12.220.247]) (authenticated) by mail.traeumt.net (amavisd-milter) (authenticated as web50m1); Tue, 10 Aug 2010 12:27:11 +0200 (CEST) (envelope-from ) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1081) Subject: Re: data recovery tool progress From: Jan Lehnardt In-Reply-To: Date: Tue, 10 Aug 2010 12:27:10 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <12229601-B7B8-4E98-931E-054DA00C5092@apache.org> References: <8385F758-360B-425A-ACBD-03C898BFDA21@apache.org> <1690416A-4C01-4756-9D3B-A256DC729813@apache.org> <154AD543-C787-441C-851B-D59CEA6765CC@apache.org> <5F47BBB4-9F58-4EFE-92C8-B0FEDA5B01B7@apache.org> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1081) On 10 Aug 2010, at 10:55, Robert Newson wrote: > In ran the db_repair code on a healthy database produced with > delayed_commits=3Dtrue. >=20 > The source db had 3218 docs. db_repair recovered 3120 and then = returned with ok. This looks like we are recovering nodes that don't need recovering = because on a healthy db produced with delayed_commits=3Dtrue we should = not have any orphans at all, so the lost and found db should be empty. >=20 > I'm redoing that test, but this indicates we're not finding all roots. >=20 > I note that the output file was 36 times the input file, which is a > consequence of folding all possible roots. I think that needs to be in > the release notes for the repair tool if that behavior remains when it > ships. >=20 > B. >=20 > On Tue, Aug 10, 2010 at 9:09 AM, Mikeal Rogers = wrote: >> I think I found a bug in the current lost+found repair. >>=20 >> I've been running it against the testwritesdb and it's in a state = that is >> never finishing. >>=20 >> It's still spitting out these lines: >>=20 >> [info] [<0.32.0>] writing 1001 updates to lost+found/testwritesdb >>=20 >> Most are 1001 but there are also other random variances 452, 866, = etc. >>=20 >> But the file size and dbinfo hasn't budged in over 30 minutes. The = size is >> stuck at 34300002 with the original db file being 54857478 . >>=20 >> This database only has one document in it that isn't "lost" so if = it's >> finding *any* new docs it should be writing them. >>=20 >> I also started another job to recover a production db that is quite = large, >> 500megs, with the missing data a week or so back. This has been = running for >> 2 hours and has still not output anything or created the lost and = found db >> so I can only assume that it is in the same state. >>=20 >> Both machines are still churning 100% CPU. >>=20 >> -Mikeal >>=20 >>=20 >> On Mon, Aug 9, 2010 at 11:26 PM, Adam Kocoloski = wrote: >>=20 >>> With Randall's help we hooked the new node scanner up to the = lost+found DB >>> generator. It seems to work well enough for small DBs; for large = DBs with >>> lots of missing nodes the O(N^2) complexity of the problem catches = up to the >>> code and generating the lost+found DB takes quite some time. Mikeal = is >>> running tests tonight. The algo appears pretty CPU-limited, so a = little >>> parallelization may be warranted. >>>=20 >>> http://github.com/kocolosk/couchdb/tree/db_repair >>>=20 >>> Adam >>>=20 >>> (I sent this previous update to myself instead of the list, so I'll = forward >>> it here ...) >>>=20 >>> On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote: >>>=20 >>>> On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote: >>>>=20 >>>>> Right, make_lost_and_found still relies on code which reads = through >>> couch_file one byte at a time, that's the cause of the slowness. = The newer >>> scanner will improve that pretty dramatically, and we can tune it = further by >>> increasing the length of the pattern that we match when looking for >>> kp/kv_node terms in the files, at the expense of some extra = complexity >>> dealing with the block prefixes (currently it does a 1-byte match, = which as >>> I understand it cannot be split across blocks). >>>>=20 >>>> The scanner now looks for a 7 byte match, unless it is within 6 = bytes of >>> a block boundary, in which case it looks for the longest possible = match at >>> that position. The more specific match condition greatly reduces = the # of >>> calls to couch_file, and thus boosts the throughput. On my laptop = it can >>> scan the testwritesdb.couch from Mikeal's couchtest repo (52 MB) in = 18 >>> seconds. >>>>=20 >>>>> Regarding the file_corruption error on the larger file, I think = this is >>> something we will just naturally trigger when we take a guess that = random >>> positions in a file are actually the beginning of a term. I think = our best >>> recourse here is to return {error, file_corruption} from couch_file = but >>> leave the gen_server up and running instead of terminating it. That = way the >>> repair code can ignore the error and keep moving without having to = reopen >>> the file. >>>>=20 >>>> I committed this change (to my db_repair branch) after consulting = with >>> Chris. The longer match condition makes these spurious = file_corruption >>> triggers much less likely, but I think it's still a good thing not = to crash >>> the server when they happen. >>>>=20 >>>>> Next steps as I understand them - Randall is working on = integrating the >>> in-memory scanner into Volker's code that finds all the dangling = by_id >>> nodes. I'm working on making sure that the scanner identifies bt = node >>> candidates which span block prefixes, and on improving its = pattern-matching. >>>>=20 >>>> Latest from my end >>>> http://github.com/kocolosk/couchdb/tree/db_repair >>>>=20 >>>>>=20 >>>>> Adam >>>>>=20 >>>>> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote: >>>>>=20 >>>>>> I pulled down the latest code from Adam's branch @ >>>>>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b. >>>>>>=20 >>>>>> Running timer:tc(couch_db_repair, make_lost_and_found, >>> ["multi_conflict"]). >>>>>> on a database with 200 lost updates spanning 200 restarts ( >>>>>> = http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch ) >>> took >>>>>> about 101 seconds. >>>>>>=20 >>>>>> I tried running against a larger databases ( >>>>>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch = ) >>> and I >>>>>> got this exception: >>>>>>=20 >>>>>> http://gist.github.com/516491 >>>>>>=20 >>>>>> -Mikeal >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds = >>> wrote: >>>>>>=20 >>>>>>> Summing up what went on in IRC for those who were absent. >>>>>>>=20 >>>>>>> The latest progress is on Adam's branch at >>>>>>> http://github.com/kocolosk/couchdb/tree/db_repair >>>>>>>=20 >>>>>>> couch_db_repair:make_lost_and_found/1 attempts to create a new >>>>>>> lost+found/DbName database to which it merges all nodes not = accessible >>>>>>> from anywhere (any other node found in a full file scan or any = header >>>>>>> pointers). >>>>>>>=20 >>>>>>> Currently, make_lost_and_found uses Volker's repair (from >>>>>>> couch_db_repair_b module, also in Adam's branch). >>>>>>> Adam found that the bottleneck was couch_file calls and that the >>>>>>> repair process was taking a very long time so he added >>>>>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as = binary >>>>>>> and tries to process it to find nodes instead of scanning back = one >>>>>>> byte at a time. It is currently not hooked up to the repair = mechanism. >>>>>>>=20 >>>>>>> Making progress. Go team. >>>>>>>=20 >>>>>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers = >>>>>>> wrote: >>>>>>>> jchris suggested on IRC that I try a normal doc update and see = if >>> that >>>>>>> fixes >>>>>>>> it. >>>>>>>>=20 >>>>>>>> It does. After a new doc was created the dbinfo doc count was = back to >>>>>>>> normal. >>>>>>>>=20 >>>>>>>> -Mikeal >>>>>>>>=20 >>>>>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers < >>> mikeal.rogers@gmail.com >>>>>>>> wrote: >>>>>>>>=20 >>>>>>>>> Ok, I pulled down this code and tested against a database with = a ton >>> of >>>>>>>>> missing writes right before a single restart. >>>>>>>>>=20 >>>>>>>>> Before restart this was the database: >>>>>>>>>=20 >>>>>>>>> { >>>>>>>>> db_name: "testwritesdb" >>>>>>>>> doc_count: 124969 >>>>>>>>> doc_del_count: 0 >>>>>>>>> update_seq: 124969 >>>>>>>>> purge_seq: 0 >>>>>>>>> compact_running: false >>>>>>>>> disk_size: 54857478 >>>>>>>>> instance_start_time: "1281384140058211" >>>>>>>>> disk_format_version: 5 >>>>>>>>> } >>>>>>>>>=20 >>>>>>>>> After restart it was this: >>>>>>>>>=20 >>>>>>>>> { >>>>>>>>> db_name: "testwritesdb" >>>>>>>>> doc_count: 1 >>>>>>>>> doc_del_count: 0 >>>>>>>>> update_seq: 1 >>>>>>>>> purge_seq: 0 >>>>>>>>> compact_running: false >>>>>>>>> disk_size: 54857478 >>>>>>>>> instance_start_time: "1281384593876026" >>>>>>>>> disk_format_version: 5 >>>>>>>>> } >>>>>>>>>=20 >>>>>>>>> After repair, it's this: >>>>>>>>>=20 >>>>>>>>> { >>>>>>>>> db_name: "testwritesdb" >>>>>>>>> doc_count: 1 >>>>>>>>> doc_del_count: 0 >>>>>>>>> update_seq: 124969 >>>>>>>>> purge_seq: 0 >>>>>>>>> compact_running: false >>>>>>>>> disk_size: 54857820 >>>>>>>>> instance_start_time: "1281385990193289" >>>>>>>>> disk_format_version: 5 >>>>>>>>> committed_update_seq: 124969 >>>>>>>>> } >>>>>>>>>=20 >>>>>>>>> All the sequences are there and hitting _all_docs shows all = the >>>>>>> documents >>>>>>>>> so why is the doc_count only 1 in the dbinfo? >>>>>>>>>=20 >>>>>>>>> -Mikeal >>>>>>>>>=20 >>>>>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana < >>>>>>> fdmanana@apache.org>wrote: >>>>>>>>>=20 >>>>>>>>>> For the record (and people not on IRC), the code at: >>>>>>>>>>=20 >>>>>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair >>>>>>>>>>=20 >>>>>>>>>> is working for at least simple cases. Use >>>>>>>>>> couch_db_repair:repair(DbNameAsString). >>>>>>>>>> There's one TODO: update the reduce values for the by_seq = and >>> by_id >>>>>>>>>> BTrees. >>>>>>>>>>=20 >>>>>>>>>> If anyone wants to give some help on this, your welcome. >>>>>>>>>>=20 >>>>>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers < >>> mikeal.rogers@gmail.com >>>>>>>>>>> wrote: >>>>>>>>>>=20 >>>>>>>>>>> I'm starting to create a bunch of test db files that expose = this >>> bug >>>>>>>>>> under >>>>>>>>>>> different conditions like multiple restarts, across = compaction, >>>>>>>>>> variances >>>>>>>>>>> in >>>>>>>>>>> updates the might cause conflict, etc. >>>>>>>>>>>=20 >>>>>>>>>>> http://github.com/mikeal/couchtest >>>>>>>>>>>=20 >>>>>>>>>>> The README outlines what was done to the db's and what needs = to be >>>>>>>>>>> recovered. >>>>>>>>>>>=20 >>>>>>>>>>> -Mikeal >>>>>>>>>>>=20 >>>>>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana < >>>>>>>>>> fdmanana@apache.org >>>>>>>>>>>> wrote: >>>>>>>>>>>=20 >>>>>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson < >>>>>>>>>> robert.newson@gmail.com >>>>>>>>>>>>> wrote: >>>>>>>>>>>>=20 >>>>>>>>>>>>> Doesn't this bit; >>>>>>>>>>>>>=20 >>>>>>>>>>>>> - Db#db{waiting_delayed_commit=3Dnil}; >>>>>>>>>>>>> + Db; >>>>>>>>>>>>> + % Db#db{waiting_delayed_commit=3Dnil}; >>>>>>>>>>>>>=20 >>>>>>>>>>>>> revert the bug fix? >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>> That's intentional, for my local testing. >>>>>>>>>>>> That patch isn't obviously anything close to final, it's = too >>>>>>>>>> experimental >>>>>>>>>>>> yet. >>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> B. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt = >>>>>>>>>> wrote: >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Filipe jumped in to start working on the recovery tool, = but he >>>>>>>>>> isn't >>>>>>>>>>>> done >>>>>>>>>>>>> yet. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Here's the current patch: >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> it is not done and very early, but any help on this is = greatly >>>>>>>>>>>>> appreciated. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> The current state is (in Filipe's words): >>>>>>>>>>>>>> - i can detect that a file needs repair >>>>>>>>>>>>>> - and get the last btree roots from it >>>>>>>>>>>>>> - "only" missing: get last db seq num >>>>>>>>>>>>>> - write new header >>>>>>>>>>>>>> - and deal with the local docs btree (if exists) >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>> Jan >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>> -- >>>>>>>>>>>> Filipe David Manana, >>>>>>>>>>>> fdmanana@apache.org >>>>>>>>>>>>=20 >>>>>>>>>>>> "Reasonable men adapt themselves to the world. >>>>>>>>>>>> Unreasonable men adapt the world to themselves. >>>>>>>>>>>> That's why all progress depends on unreasonable men." >>>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>> -- >>>>>>>>>> Filipe David Manana, >>>>>>>>>> fdmanana@apache.org >>>>>>>>>>=20 >>>>>>>>>> "Reasonable men adapt themselves to the world. >>>>>>>>>> Unreasonable men adapt the world to themselves. >>>>>>>>>> That's why all progress depends on unreasonable men." >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>=20 >>>>=20 >>>=20 >>>=20 >>=20