Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 40017 invoked from network); 10 Aug 2010 06:27:29 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Aug 2010 06:27:29 -0000 Received: (qmail 62140 invoked by uid 500); 10 Aug 2010 06:27:28 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 61717 invoked by uid 500); 10 Aug 2010 06:27:25 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 61709 invoked by uid 99); 10 Aug 2010 06:27:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 06:27:24 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adam.kocoloski@gmail.com designates 209.85.216.180 as permitted sender) Received: from [209.85.216.180] (HELO mail-qy0-f180.google.com) (209.85.216.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 06:27:18 +0000 Received: by qyk31 with SMTP id 31so8469094qyk.11 for ; Mon, 09 Aug 2010 23:26:58 -0700 (PDT) Received: by 10.224.28.144 with SMTP id m16mr9366258qac.339.1281421617779; Mon, 09 Aug 2010 23:26:57 -0700 (PDT) Received: from [10.0.1.4] (c-71-232-49-44.hsd1.ma.comcast.net [71.232.49.44]) by mx.google.com with ESMTPS id w5sm7506112qcq.31.2010.08.09.23.26.55 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 09 Aug 2010 23:26:56 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1081) Subject: Re: data recovery tool progress From: Adam Kocoloski In-Reply-To: <154AD543-C787-441C-851B-D59CEA6765CC@apache.org> Date: Tue, 10 Aug 2010 02:26:51 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <5F47BBB4-9F58-4EFE-92C8-B0FEDA5B01B7@apache.org> References: <8385F758-360B-425A-ACBD-03C898BFDA21@apache.org> <1690416A-4C01-4756-9D3B-A256DC729813@apache.org> <154AD543-C787-441C-851B-D59CEA6765CC@apache.org> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1081) With Randall's help we hooked the new node scanner up to the lost+found = DB generator. It seems to work well enough for small DBs; for large DBs = with lots of missing nodes the O(N^2) complexity of the problem catches = up to the code and generating the lost+found DB takes quite some time. = Mikeal is running tests tonight. The algo appears pretty CPU-limited, = so a little parallelization may be warranted. http://github.com/kocolosk/couchdb/tree/db_repair Adam (I sent this previous update to myself instead of the list, so I'll = forward it here ...) On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote: > On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote: >=20 >> Right, make_lost_and_found still relies on code which reads through = couch_file one byte at a time, that's the cause of the slowness. The = newer scanner will improve that pretty dramatically, and we can tune it = further by increasing the length of the pattern that we match when = looking for kp/kv_node terms in the files, at the expense of some extra = complexity dealing with the block prefixes (currently it does a 1-byte = match, which as I understand it cannot be split across blocks). >=20 > The scanner now looks for a 7 byte match, unless it is within 6 bytes = of a block boundary, in which case it looks for the longest possible = match at that position. The more specific match condition greatly = reduces the # of calls to couch_file, and thus boosts the throughput. = On my laptop it can scan the testwritesdb.couch from Mikeal's couchtest = repo (52 MB) in 18 seconds. >=20 >> Regarding the file_corruption error on the larger file, I think this = is something we will just naturally trigger when we take a guess that = random positions in a file are actually the beginning of a term. I = think our best recourse here is to return {error, file_corruption} from = couch_file but leave the gen_server up and running instead of = terminating it. That way the repair code can ignore the error and keep = moving without having to reopen the file. >=20 > I committed this change (to my db_repair branch) after consulting with = Chris. The longer match condition makes these spurious file_corruption = triggers much less likely, but I think it's still a good thing not to = crash the server when they happen. >=20 >> Next steps as I understand them - Randall is working on integrating = the in-memory scanner into Volker's code that finds all the dangling = by_id nodes. I'm working on making sure that the scanner identifies bt = node candidates which span block prefixes, and on improving its = pattern-matching. >=20 > Latest from my end > http://github.com/kocolosk/couchdb/tree/db_repair >=20 >>=20 >> Adam >>=20 >> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote: >>=20 >>> I pulled down the latest code from Adam's branch @ >>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b. >>>=20 >>> Running timer:tc(couch_db_repair, make_lost_and_found, = ["multi_conflict"]). >>> on a database with 200 lost updates spanning 200 restarts ( >>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch = ) took >>> about 101 seconds. >>>=20 >>> I tried running against a larger databases ( >>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch ) = and I >>> got this exception: >>>=20 >>> http://gist.github.com/516491 >>>=20 >>> -Mikeal >>>=20 >>>=20 >>>=20 >>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds = wrote: >>>=20 >>>> Summing up what went on in IRC for those who were absent. >>>>=20 >>>> The latest progress is on Adam's branch at >>>> http://github.com/kocolosk/couchdb/tree/db_repair >>>>=20 >>>> couch_db_repair:make_lost_and_found/1 attempts to create a new >>>> lost+found/DbName database to which it merges all nodes not = accessible >>>> from anywhere (any other node found in a full file scan or any = header >>>> pointers). >>>>=20 >>>> Currently, make_lost_and_found uses Volker's repair (from >>>> couch_db_repair_b module, also in Adam's branch). >>>> Adam found that the bottleneck was couch_file calls and that the >>>> repair process was taking a very long time so he added >>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as = binary >>>> and tries to process it to find nodes instead of scanning back one >>>> byte at a time. It is currently not hooked up to the repair = mechanism. >>>>=20 >>>> Making progress. Go team. >>>>=20 >>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers = >>>> wrote: >>>>> jchris suggested on IRC that I try a normal doc update and see if = that >>>> fixes >>>>> it. >>>>>=20 >>>>> It does. After a new doc was created the dbinfo doc count was back = to >>>>> normal. >>>>>=20 >>>>> -Mikeal >>>>>=20 >>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers = >>>> wrote: >>>>>=20 >>>>>> Ok, I pulled down this code and tested against a database with a = ton of >>>>>> missing writes right before a single restart. >>>>>>=20 >>>>>> Before restart this was the database: >>>>>>=20 >>>>>> { >>>>>> db_name: "testwritesdb" >>>>>> doc_count: 124969 >>>>>> doc_del_count: 0 >>>>>> update_seq: 124969 >>>>>> purge_seq: 0 >>>>>> compact_running: false >>>>>> disk_size: 54857478 >>>>>> instance_start_time: "1281384140058211" >>>>>> disk_format_version: 5 >>>>>> } >>>>>>=20 >>>>>> After restart it was this: >>>>>>=20 >>>>>> { >>>>>> db_name: "testwritesdb" >>>>>> doc_count: 1 >>>>>> doc_del_count: 0 >>>>>> update_seq: 1 >>>>>> purge_seq: 0 >>>>>> compact_running: false >>>>>> disk_size: 54857478 >>>>>> instance_start_time: "1281384593876026" >>>>>> disk_format_version: 5 >>>>>> } >>>>>>=20 >>>>>> After repair, it's this: >>>>>>=20 >>>>>> { >>>>>> db_name: "testwritesdb" >>>>>> doc_count: 1 >>>>>> doc_del_count: 0 >>>>>> update_seq: 124969 >>>>>> purge_seq: 0 >>>>>> compact_running: false >>>>>> disk_size: 54857820 >>>>>> instance_start_time: "1281385990193289" >>>>>> disk_format_version: 5 >>>>>> committed_update_seq: 124969 >>>>>> } >>>>>>=20 >>>>>> All the sequences are there and hitting _all_docs shows all the >>>> documents >>>>>> so why is the doc_count only 1 in the dbinfo? >>>>>>=20 >>>>>> -Mikeal >>>>>>=20 >>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana < >>>> fdmanana@apache.org>wrote: >>>>>>=20 >>>>>>> For the record (and people not on IRC), the code at: >>>>>>>=20 >>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair >>>>>>>=20 >>>>>>> is working for at least simple cases. Use >>>>>>> couch_db_repair:repair(DbNameAsString). >>>>>>> There's one TODO: update the reduce values for the by_seq and = by_id >>>>>>> BTrees. >>>>>>>=20 >>>>>>> If anyone wants to give some help on this, your welcome. >>>>>>>=20 >>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers = >>>>>>> wrote: >>>>>>>=20 >>>>>>>> I'm starting to create a bunch of test db files that expose = this bug >>>>>>> under >>>>>>>> different conditions like multiple restarts, across compaction, >>>>>>> variances >>>>>>>> in >>>>>>>> updates the might cause conflict, etc. >>>>>>>>=20 >>>>>>>> http://github.com/mikeal/couchtest >>>>>>>>=20 >>>>>>>> The README outlines what was done to the db's and what needs to = be >>>>>>>> recovered. >>>>>>>>=20 >>>>>>>> -Mikeal >>>>>>>>=20 >>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana < >>>>>>> fdmanana@apache.org >>>>>>>>> wrote: >>>>>>>>=20 >>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson < >>>>>>> robert.newson@gmail.com >>>>>>>>>> wrote: >>>>>>>>>=20 >>>>>>>>>> Doesn't this bit; >>>>>>>>>>=20 >>>>>>>>>> - Db#db{waiting_delayed_commit=3Dnil}; >>>>>>>>>> + Db; >>>>>>>>>> + % Db#db{waiting_delayed_commit=3Dnil}; >>>>>>>>>>=20 >>>>>>>>>> revert the bug fix? >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> That's intentional, for my local testing. >>>>>>>>> That patch isn't obviously anything close to final, it's too >>>>>>> experimental >>>>>>>>> yet. >>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>> B. >>>>>>>>>>=20 >>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt >>>>>>> wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>>=20 >>>>>>>>>>> Filipe jumped in to start working on the recovery tool, but = he >>>>>>> isn't >>>>>>>>> done >>>>>>>>>> yet. >>>>>>>>>>>=20 >>>>>>>>>>> Here's the current patch: >>>>>>>>>>>=20 >>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz >>>>>>>>>>>=20 >>>>>>>>>>> it is not done and very early, but any help on this is = greatly >>>>>>>>>> appreciated. >>>>>>>>>>>=20 >>>>>>>>>>> The current state is (in Filipe's words): >>>>>>>>>>> - i can detect that a file needs repair >>>>>>>>>>> - and get the last btree roots from it >>>>>>>>>>> - "only" missing: get last db seq num >>>>>>>>>>> - write new header >>>>>>>>>>> - and deal with the local docs btree (if exists) >>>>>>>>>>>=20 >>>>>>>>>>> Thanks! >>>>>>>>>>> Jan >>>>>>>>>>> -- >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> -- >>>>>>>>> Filipe David Manana, >>>>>>>>> fdmanana@apache.org >>>>>>>>>=20 >>>>>>>>> "Reasonable men adapt themselves to the world. >>>>>>>>> Unreasonable men adapt the world to themselves. >>>>>>>>> That's why all progress depends on unreasonable men." >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> -- >>>>>>> Filipe David Manana, >>>>>>> fdmanana@apache.org >>>>>>>=20 >>>>>>> "Reasonable men adapt themselves to the world. >>>>>>> Unreasonable men adapt the world to themselves. >>>>>>> That's why all progress depends on unreasonable men." >>>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>=20 >=20