Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: pass (athena.apache.org: domain of adam.kocoloski@gmail.com
 designates 209.85.216.180 as permitted sender)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1081)
Subject: Re: data recovery tool progress
From: Adam Kocoloski <kocolosk@apache.org>
In-Reply-To: <154AD543-C787-441C-851B-D59CEA6765CC@apache.org>
Date: Tue, 10 Aug 2010 02:26:51 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <5F47BBB4-9F58-4EFE-92C8-B0FEDA5B01B7@apache.org>
References: <8385F758-360B-425A-ACBD-03C898BFDA21@apache.org>
 <AANLkTik-b7ZRLh+mb1BhgupAEAGQ049o5F5AwPPVT3Qn@mail.gmail.com>
 <AANLkTikeG6apE8MNwwse26yHy1s4BiT_7vN-YqxKGWsS@mail.gmail.com>
 <AANLkTi=zPAbNxXv1J4D2WHjKQLCH_pD9pn0BvHs+jsaw@mail.gmail.com>
 <AANLkTim9md-HC+akByb7s3-QDUMvgY+L5UVbhqRJbyvg@mail.gmail.com>
 <AANLkTimcx+YVM3V9yWx3LrkBxoK-OjpkCpcFEz=p=QRv@mail.gmail.com>
 <AANLkTimQLCOR-=+VKEaRn86Zjfm_J3DVWvbAikUBM9Eh@mail.gmail.com>
 <AANLkTim-eGpRF6VSy7mLNZEMLUeN53CmtqRHtG_DSRt3@mail.gmail.com>
 <AANLkTincLdjDtZC8gD4_Xkk0XJNpO5aurnDCLQgc3=ik@mail.gmail.com>
 <1690416A-4C01-4756-9D3B-A256DC729813@apache.org>
 <154AD543-C787-441C-851B-D59CEA6765CC@apache.org>
To: dev@couchdb.apache.org

With Randall's help we hooked the new node scanner up to the lost+found =
DB generator.  It seems to work well enough for small DBs; for large DBs =
with lots of missing nodes the O(N^2) complexity of the problem catches =
up to the code and generating the lost+found DB takes quite some time.  =
Mikeal is running tests tonight.  The algo appears pretty CPU-limited, =
so a little parallelization may be warranted.

http://github.com/kocolosk/couchdb/tree/db_repair

Adam

(I sent this previous update to myself instead of the list, so I'll =
forward it here ...)

On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote:

> On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote:
>=20
>> Right, make_lost_and_found still relies on code which reads through =
couch_file one byte at a time, that's the cause of the slowness.  The =
newer scanner will improve that pretty dramatically, and we can tune it =
further by increasing the length of the pattern that we match when =
looking for kp/kv_node terms in the files, at the expense of some extra =
complexity dealing with the block prefixes (currently it does a 1-byte =
match, which as I understand it cannot be split across blocks).
>=20
> The scanner now looks for a 7 byte match, unless it is within 6 bytes =
of a block boundary, in which case it looks for the longest possible =
match at that position.  The more specific match condition greatly =
reduces the # of calls to couch_file, and thus boosts the throughput.  =
On my laptop it can scan the testwritesdb.couch from Mikeal's couchtest =
repo (52 MB) in 18 seconds.
>=20
>> Regarding the file_corruption error on the larger file, I think this =
is something we will just naturally trigger when we take a guess that =
random positions in a file are actually the beginning of a term.  I =
think our best recourse here is to return {error, file_corruption} from =
couch_file but leave the gen_server up and running instead of =
terminating it.  That way the repair code can ignore the error and keep =
moving without having to reopen the file.
>=20
> I committed this change (to my db_repair branch) after consulting with =
Chris.  The longer match condition makes these spurious file_corruption =
triggers much less likely, but I think it's still a good thing not to =
crash the server when they happen.
>=20
>> Next steps as I understand them - Randall is working on integrating =
the in-memory scanner into Volker's code that finds all the dangling =
by_id nodes.  I'm working on making sure that the scanner identifies bt =
node candidates which span block prefixes, and on improving its =
pattern-matching.
>=20
> Latest from my end
> http://github.com/kocolosk/couchdb/tree/db_repair
>=20
>>=20
>> Adam
>>=20
>> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote:
>>=20
>>> I pulled down the latest code from Adam's branch @
>>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b.
>>>=20
>>> Running timer:tc(couch_db_repair, make_lost_and_found, =
["multi_conflict"]).
>>> on a database with 200 lost updates spanning 200 restarts (
>>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch =
) took
>>> about 101 seconds.
>>>=20
>>> I tried running against a larger databases (
>>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch ) =
and I
>>> got this exception:
>>>=20
>>> http://gist.github.com/516491
>>>=20
>>> -Mikeal
>>>=20
>>>=20
>>>=20
>>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds =
<randall.leeds@gmail.com>wrote:
>>>=20
>>>> Summing up what went on in IRC for those who were absent.
>>>>=20
>>>> The latest progress is on Adam's branch at
>>>> http://github.com/kocolosk/couchdb/tree/db_repair
>>>>=20
>>>> couch_db_repair:make_lost_and_found/1 attempts to create a new
>>>> lost+found/DbName database to which it merges all nodes not =
accessible
>>>> from anywhere (any other node found in a full file scan or any =
header
>>>> pointers).
>>>>=20
>>>> Currently, make_lost_and_found uses Volker's repair (from
>>>> couch_db_repair_b module, also in Adam's branch).
>>>> Adam found that the bottleneck was couch_file calls and that the
>>>> repair process was taking a very long time so he added
>>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as =
binary
>>>> and tries to process it to find nodes instead of scanning back one
>>>> byte at a time. It is currently not hooked up to the repair =
mechanism.
>>>>=20
>>>> Making progress. Go team.
>>>>=20
>>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers =
<mikeal.rogers@gmail.com>
>>>> wrote:
>>>>> jchris suggested on IRC that I try a normal doc update and see if =
that
>>>> fixes
>>>>> it.
>>>>>=20
>>>>> It does. After a new doc was created the dbinfo doc count was back =
to
>>>>> normal.
>>>>>=20
>>>>> -Mikeal
>>>>>=20
>>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers =
<mikeal.rogers@gmail.com
>>>>> wrote:
>>>>>=20
>>>>>> Ok, I pulled down this code and tested against a database with a =
ton of
>>>>>> missing writes right before a single restart.
>>>>>>=20
>>>>>> Before restart this was the database:
>>>>>>=20
>>>>>> {
>>>>>> db_name: "testwritesdb"
>>>>>> doc_count: 124969
>>>>>> doc_del_count: 0
>>>>>> update_seq: 124969
>>>>>> purge_seq: 0
>>>>>> compact_running: false
>>>>>> disk_size: 54857478
>>>>>> instance_start_time: "1281384140058211"
>>>>>> disk_format_version: 5
>>>>>> }
>>>>>>=20
>>>>>> After restart it was this:
>>>>>>=20
>>>>>> {
>>>>>> db_name: "testwritesdb"
>>>>>> doc_count: 1
>>>>>> doc_del_count: 0
>>>>>> update_seq: 1
>>>>>> purge_seq: 0
>>>>>> compact_running: false
>>>>>> disk_size: 54857478
>>>>>> instance_start_time: "1281384593876026"
>>>>>> disk_format_version: 5
>>>>>> }
>>>>>>=20
>>>>>> After repair, it's this:
>>>>>>=20
>>>>>> {
>>>>>> db_name: "testwritesdb"
>>>>>> doc_count: 1
>>>>>> doc_del_count: 0
>>>>>> update_seq: 124969
>>>>>> purge_seq: 0
>>>>>> compact_running: false
>>>>>> disk_size: 54857820
>>>>>> instance_start_time: "1281385990193289"
>>>>>> disk_format_version: 5
>>>>>> committed_update_seq: 124969
>>>>>> }
>>>>>>=20
>>>>>> All the sequences are there and hitting _all_docs shows all the
>>>> documents
>>>>>> so why is the doc_count only 1 in the dbinfo?
>>>>>>=20
>>>>>> -Mikeal
>>>>>>=20
>>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana <
>>>> fdmanana@apache.org>wrote:
>>>>>>=20
>>>>>>> For the record (and people not on IRC), the code at:
>>>>>>>=20
>>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair
>>>>>>>=20
>>>>>>> is working for at least simple cases. Use
>>>>>>> couch_db_repair:repair(DbNameAsString).
>>>>>>> There's one TODO:  update the reduce values for the by_seq and =
by_id
>>>>>>> BTrees.
>>>>>>>=20
>>>>>>> If anyone wants to give some help on this, your welcome.
>>>>>>>=20
>>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers =
<mikeal.rogers@gmail.com
>>>>>>>> wrote:
>>>>>>>=20
>>>>>>>> I'm starting to create a bunch of test db files that expose =
this bug
>>>>>>> under
>>>>>>>> different conditions like multiple restarts, across compaction,
>>>>>>> variances
>>>>>>>> in
>>>>>>>> updates the might cause conflict, etc.
>>>>>>>>=20
>>>>>>>> http://github.com/mikeal/couchtest
>>>>>>>>=20
>>>>>>>> The README outlines what was done to the db's and what needs to =
be
>>>>>>>> recovered.
>>>>>>>>=20
>>>>>>>> -Mikeal
>>>>>>>>=20
>>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana <
>>>>>>> fdmanana@apache.org
>>>>>>>>> wrote:
>>>>>>>>=20
>>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson <
>>>>>>> robert.newson@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>=20
>>>>>>>>>> Doesn't this bit;
>>>>>>>>>>=20
>>>>>>>>>> -        Db#db{waiting_delayed_commit=3Dnil};
>>>>>>>>>> +        Db;
>>>>>>>>>> +        % Db#db{waiting_delayed_commit=3Dnil};
>>>>>>>>>>=20
>>>>>>>>>> revert the bug fix?
>>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>> That's intentional, for my local testing.
>>>>>>>>> That patch isn't obviously anything close to final, it's too
>>>>>>> experimental
>>>>>>>>> yet.
>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> B.
>>>>>>>>>>=20
>>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt <jan@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>=20
>>>>>>>>>>> Filipe jumped in to start working on the recovery tool, but =
he
>>>>>>> isn't
>>>>>>>>> done
>>>>>>>>>> yet.
>>>>>>>>>>>=20
>>>>>>>>>>> Here's the current patch:
>>>>>>>>>>>=20
>>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz
>>>>>>>>>>>=20
>>>>>>>>>>> it is not done and very early, but any help on this is =
greatly
>>>>>>>>>> appreciated.
>>>>>>>>>>>=20
>>>>>>>>>>> The current state is (in Filipe's words):
>>>>>>>>>>> - i can detect that a file needs repair
>>>>>>>>>>> - and get the last btree roots from it
>>>>>>>>>>> - "only" missing: get last db seq num
>>>>>>>>>>> - write new header
>>>>>>>>>>> - and deal with the local docs btree (if exists)
>>>>>>>>>>>=20
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Jan
>>>>>>>>>>> --
>>>>>>>>>>>=20
>>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>> --
>>>>>>>>> Filipe David Manana,
>>>>>>>>> fdmanana@apache.org
>>>>>>>>>=20
>>>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>>>> Unreasonable men adapt the world to themselves.
>>>>>>>>> That's why all progress depends on unreasonable men."
>>>>>>>>>=20
>>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>> --
>>>>>>> Filipe David Manana,
>>>>>>> fdmanana@apache.org
>>>>>>>=20
>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>> Unreasonable men adapt the world to themselves.
>>>>>>> That's why all progress depends on unreasonable men."
>>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>=20
>>>>=20
>>=20
>=20