Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: pass (nike.apache.org: domain of fdmanana@gmail.com designates
 209.85.161.52 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type;
        b=S+6Efr79JDCjqDIB5rEGU5AWD6KHg1s97ENwDfLQLCp+3OxPAYINV83d/NvsON2Zo/
         RK3ei0oM5tXGKoIXtth1bILD4cRjxRkbfwAwa5+z4PdV+PU0D/QvjqrbqxPDfJxnIpeW
         oNDniW3I5flHiYoXqA47kAoItxwly7I976hBc=
MIME-Version: 1.0
Sender: fdmanana@gmail.com
In-Reply-To: <AANLkTikKEaMi=kdVKPjsqDpvwsMG7L8EvkM7xuNywvrM@mail.gmail.com>
References: <8385F758-360B-425A-ACBD-03C898BFDA21@apache.org>
	<AANLkTik-b7ZRLh+mb1BhgupAEAGQ049o5F5AwPPVT3Qn@mail.gmail.com>
	<AANLkTikeG6apE8MNwwse26yHy1s4BiT_7vN-YqxKGWsS@mail.gmail.com>
	<AANLkTi=zPAbNxXv1J4D2WHjKQLCH_pD9pn0BvHs+jsaw@mail.gmail.com>
	<AANLkTim9md-HC+akByb7s3-QDUMvgY+L5UVbhqRJbyvg@mail.gmail.com>
	<AANLkTimcx+YVM3V9yWx3LrkBxoK-OjpkCpcFEz=p=QRv@mail.gmail.com>
	<AANLkTimQLCOR-=+VKEaRn86Zjfm_J3DVWvbAikUBM9Eh@mail.gmail.com>
	<AANLkTim-eGpRF6VSy7mLNZEMLUeN53CmtqRHtG_DSRt3@mail.gmail.com>
	<AANLkTincLdjDtZC8gD4_Xkk0XJNpO5aurnDCLQgc3=ik@mail.gmail.com>
	<1690416A-4C01-4756-9D3B-A256DC729813@apache.org>
	<154AD543-C787-441C-851B-D59CEA6765CC@apache.org>
	<5F47BBB4-9F58-4EFE-92C8-B0FEDA5B01B7@apache.org>
	<AANLkTimK-S_5uoBtq5Dmoy5iuH7RnZXs-XYsnntbDb=P@mail.gmail.com>
	<AANLkTikKEaMi=kdVKPjsqDpvwsMG7L8EvkM7xuNywvrM@mail.gmail.com>
Date: Tue, 10 Aug 2010 10:28:36 +0100
Message-ID: <AANLkTikqYQYrogoFLR8Sh0SOe1EjvJP=yRck698A=_kH@mail.gmail.com>
Subject: Re: data recovery tool progress
From: Filipe David Manana <fdmanana@apache.org>
To: dev@couchdb.apache.org
Content-Type: multipart/alternative; boundary=001485ed5748f65409048d74c14c

--001485ed5748f65409048d74c14c
Content-Type: text/plain; charset=UTF-8

On Tue, Aug 10, 2010 at 9:55 AM, Robert Newson <robert.newson@gmail.com>wrote:

> In ran the db_repair code on a healthy database produced with
> delayed_commits=true.
>
> The source db had 3218 docs. db_repair recovered 3120 and then returned
> with ok.
>

When a DB is repaired, couch_db_repair:repair/1 returns something matching
{ok, repaired, _BTreeInfos}.
If it returns only the atom 'ok' it means it did nothing to the DB file.
At least in my original code, dunno if the forks changed that behaviour.


>
> I'm redoing that test, but this indicates we're not finding all roots.
>
> I note that the output file was 36 times the input file, which is a
> consequence of folding all possible roots. I think that needs to be in
> the release notes for the repair tool if that behavior remains when it
> ships.
>
> B.
>
> On Tue, Aug 10, 2010 at 9:09 AM, Mikeal Rogers <mikeal.rogers@gmail.com>
> wrote:
> > I think I found a bug in the current lost+found repair.
> >
> > I've been running it against the testwritesdb and it's in a state that is
> > never finishing.
> >
> > It's still spitting out these lines:
> >
> > [info] [<0.32.0>] writing 1001 updates to lost+found/testwritesdb
> >
> > Most are 1001 but there are also other random variances 452, 866, etc.
> >
> > But the file size and dbinfo hasn't budged in over 30 minutes. The size
> is
> > stuck at 34300002 with the original db file being 54857478 .
> >
> > This database only has one document in it that isn't "lost" so if it's
> > finding *any* new docs it should be writing them.
> >
> > I also started another job to recover a production db that is quite
> large,
> > 500megs, with the missing data a week or so back. This has been running
> for
> > 2 hours and has still not output anything or created the lost and found
> db
> > so I can only assume that it is in the same state.
> >
> > Both machines are still churning 100% CPU.
> >
> > -Mikeal
> >
> >
> > On Mon, Aug 9, 2010 at 11:26 PM, Adam Kocoloski <kocolosk@apache.org>
> wrote:
> >
> >> With Randall's help we hooked the new node scanner up to the lost+found
> DB
> >> generator.  It seems to work well enough for small DBs; for large DBs
> with
> >> lots of missing nodes the O(N^2) complexity of the problem catches up to
> the
> >> code and generating the lost+found DB takes quite some time.  Mikeal is
> >> running tests tonight.  The algo appears pretty CPU-limited, so a little
> >> parallelization may be warranted.
> >>
> >> http://github.com/kocolosk/couchdb/tree/db_repair
> >>
> >> Adam
> >>
> >> (I sent this previous update to myself instead of the list, so I'll
> forward
> >> it here ...)
> >>
> >> On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote:
> >>
> >> > On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote:
> >> >
> >> >> Right, make_lost_and_found still relies on code which reads through
> >> couch_file one byte at a time, that's the cause of the slowness.  The
> newer
> >> scanner will improve that pretty dramatically, and we can tune it
> further by
> >> increasing the length of the pattern that we match when looking for
> >> kp/kv_node terms in the files, at the expense of some extra complexity
> >> dealing with the block prefixes (currently it does a 1-byte match, which
> as
> >> I understand it cannot be split across blocks).
> >> >
> >> > The scanner now looks for a 7 byte match, unless it is within 6 bytes
> of
> >> a block boundary, in which case it looks for the longest possible match
> at
> >> that position.  The more specific match condition greatly reduces the #
> of
> >> calls to couch_file, and thus boosts the throughput.  On my laptop it
> can
> >> scan the testwritesdb.couch from Mikeal's couchtest repo (52 MB) in 18
> >> seconds.
> >> >
> >> >> Regarding the file_corruption error on the larger file, I think this
> is
> >> something we will just naturally trigger when we take a guess that
> random
> >> positions in a file are actually the beginning of a term.  I think our
> best
> >> recourse here is to return {error, file_corruption} from couch_file but
> >> leave the gen_server up and running instead of terminating it.  That way
> the
> >> repair code can ignore the error and keep moving without having to
> reopen
> >> the file.
> >> >
> >> > I committed this change (to my db_repair branch) after consulting with
> >> Chris.  The longer match condition makes these spurious file_corruption
> >> triggers much less likely, but I think it's still a good thing not to
> crash
> >> the server when they happen.
> >> >
> >> >> Next steps as I understand them - Randall is working on integrating
> the
> >> in-memory scanner into Volker's code that finds all the dangling by_id
> >> nodes.  I'm working on making sure that the scanner identifies bt node
> >> candidates which span block prefixes, and on improving its
> pattern-matching.
> >> >
> >> > Latest from my end
> >> > http://github.com/kocolosk/couchdb/tree/db_repair
> >> >
> >> >>
> >> >> Adam
> >> >>
> >> >> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote:
> >> >>
> >> >>> I pulled down the latest code from Adam's branch @
> >> >>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b.
> >> >>>
> >> >>> Running timer:tc(couch_db_repair, make_lost_and_found,
> >> ["multi_conflict"]).
> >> >>> on a database with 200 lost updates spanning 200 restarts (
> >> >>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch)
> >> took
> >> >>> about 101 seconds.
> >> >>>
> >> >>> I tried running against a larger databases (
> >> >>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch )
> >> and I
> >> >>> got this exception:
> >> >>>
> >> >>> http://gist.github.com/516491
> >> >>>
> >> >>> -Mikeal
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds <
> randall.leeds@gmail.com
> >> >wrote:
> >> >>>
> >> >>>> Summing up what went on in IRC for those who were absent.
> >> >>>>
> >> >>>> The latest progress is on Adam's branch at
> >> >>>> http://github.com/kocolosk/couchdb/tree/db_repair
> >> >>>>
> >> >>>> couch_db_repair:make_lost_and_found/1 attempts to create a new
> >> >>>> lost+found/DbName database to which it merges all nodes not
> accessible
> >> >>>> from anywhere (any other node found in a full file scan or any
> header
> >> >>>> pointers).
> >> >>>>
> >> >>>> Currently, make_lost_and_found uses Volker's repair (from
> >> >>>> couch_db_repair_b module, also in Adam's branch).
> >> >>>> Adam found that the bottleneck was couch_file calls and that the
> >> >>>> repair process was taking a very long time so he added
> >> >>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as
> binary
> >> >>>> and tries to process it to find nodes instead of scanning back one
> >> >>>> byte at a time. It is currently not hooked up to the repair
> mechanism.
> >> >>>>
> >> >>>> Making progress. Go team.
> >> >>>>
> >> >>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers <
> mikeal.rogers@gmail.com>
> >> >>>> wrote:
> >> >>>>> jchris suggested on IRC that I try a normal doc update and see if
> >> that
> >> >>>> fixes
> >> >>>>> it.
> >> >>>>>
> >> >>>>> It does. After a new doc was created the dbinfo doc count was back
> to
> >> >>>>> normal.
> >> >>>>>
> >> >>>>> -Mikeal
> >> >>>>>
> >> >>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers <
> >> mikeal.rogers@gmail.com
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> Ok, I pulled down this code and tested against a database with a
> ton
> >> of
> >> >>>>>> missing writes right before a single restart.
> >> >>>>>>
> >> >>>>>> Before restart this was the database:
> >> >>>>>>
> >> >>>>>> {
> >> >>>>>> db_name: "testwritesdb"
> >> >>>>>> doc_count: 124969
> >> >>>>>> doc_del_count: 0
> >> >>>>>> update_seq: 124969
> >> >>>>>> purge_seq: 0
> >> >>>>>> compact_running: false
> >> >>>>>> disk_size: 54857478
> >> >>>>>> instance_start_time: "1281384140058211"
> >> >>>>>> disk_format_version: 5
> >> >>>>>> }
> >> >>>>>>
> >> >>>>>> After restart it was this:
> >> >>>>>>
> >> >>>>>> {
> >> >>>>>> db_name: "testwritesdb"
> >> >>>>>> doc_count: 1
> >> >>>>>> doc_del_count: 0
> >> >>>>>> update_seq: 1
> >> >>>>>> purge_seq: 0
> >> >>>>>> compact_running: false
> >> >>>>>> disk_size: 54857478
> >> >>>>>> instance_start_time: "1281384593876026"
> >> >>>>>> disk_format_version: 5
> >> >>>>>> }
> >> >>>>>>
> >> >>>>>> After repair, it's this:
> >> >>>>>>
> >> >>>>>> {
> >> >>>>>> db_name: "testwritesdb"
> >> >>>>>> doc_count: 1
> >> >>>>>> doc_del_count: 0
> >> >>>>>> update_seq: 124969
> >> >>>>>> purge_seq: 0
> >> >>>>>> compact_running: false
> >> >>>>>> disk_size: 54857820
> >> >>>>>> instance_start_time: "1281385990193289"
> >> >>>>>> disk_format_version: 5
> >> >>>>>> committed_update_seq: 124969
> >> >>>>>> }
> >> >>>>>>
> >> >>>>>> All the sequences are there and hitting _all_docs shows all the
> >> >>>> documents
> >> >>>>>> so why is the doc_count only 1 in the dbinfo?
> >> >>>>>>
> >> >>>>>> -Mikeal
> >> >>>>>>
> >> >>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana <
> >> >>>> fdmanana@apache.org>wrote:
> >> >>>>>>
> >> >>>>>>> For the record (and people not on IRC), the code at:
> >> >>>>>>>
> >> >>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair
> >> >>>>>>>
> >> >>>>>>> is working for at least simple cases. Use
> >> >>>>>>> couch_db_repair:repair(DbNameAsString).
> >> >>>>>>> There's one TODO:  update the reduce values for the by_seq and
> >> by_id
> >> >>>>>>> BTrees.
> >> >>>>>>>
> >> >>>>>>> If anyone wants to give some help on this, your welcome.
> >> >>>>>>>
> >> >>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers <
> >> mikeal.rogers@gmail.com
> >> >>>>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>>> I'm starting to create a bunch of test db files that expose
> this
> >> bug
> >> >>>>>>> under
> >> >>>>>>>> different conditions like multiple restarts, across compaction,
> >> >>>>>>> variances
> >> >>>>>>>> in
> >> >>>>>>>> updates the might cause conflict, etc.
> >> >>>>>>>>
> >> >>>>>>>> http://github.com/mikeal/couchtest
> >> >>>>>>>>
> >> >>>>>>>> The README outlines what was done to the db's and what needs to
> be
> >> >>>>>>>> recovered.
> >> >>>>>>>>
> >> >>>>>>>> -Mikeal
> >> >>>>>>>>
> >> >>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana <
> >> >>>>>>> fdmanana@apache.org
> >> >>>>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson <
> >> >>>>>>> robert.newson@gmail.com
> >> >>>>>>>>>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>> Doesn't this bit;
> >> >>>>>>>>>>
> >> >>>>>>>>>> -        Db#db{waiting_delayed_commit=nil};
> >> >>>>>>>>>> +        Db;
> >> >>>>>>>>>> +        % Db#db{waiting_delayed_commit=nil};
> >> >>>>>>>>>>
> >> >>>>>>>>>> revert the bug fix?
> >> >>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> That's intentional, for my local testing.
> >> >>>>>>>>> That patch isn't obviously anything close to final, it's too
> >> >>>>>>> experimental
> >> >>>>>>>>> yet.
> >> >>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> B.
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt <jan@apache.org
> >
> >> >>>>>>> wrote:
> >> >>>>>>>>>>> Hi All,
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Filipe jumped in to start working on the recovery tool, but
> he
> >> >>>>>>> isn't
> >> >>>>>>>>> done
> >> >>>>>>>>>> yet.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Here's the current patch:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> it is not done and very early, but any help on this is
> greatly
> >> >>>>>>>>>> appreciated.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> The current state is (in Filipe's words):
> >> >>>>>>>>>>> - i can detect that a file needs repair
> >> >>>>>>>>>>> - and get the last btree roots from it
> >> >>>>>>>>>>> - "only" missing: get last db seq num
> >> >>>>>>>>>>> - write new header
> >> >>>>>>>>>>> - and deal with the local docs btree (if exists)
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Thanks!
> >> >>>>>>>>>>> Jan
> >> >>>>>>>>>>> --
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>> Filipe David Manana,
> >> >>>>>>>>> fdmanana@apache.org
> >> >>>>>>>>>
> >> >>>>>>>>> "Reasonable men adapt themselves to the world.
> >> >>>>>>>>> Unreasonable men adapt the world to themselves.
> >> >>>>>>>>> That's why all progress depends on unreasonable men."
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>> Filipe David Manana,
> >> >>>>>>> fdmanana@apache.org
> >> >>>>>>>
> >> >>>>>>> "Reasonable men adapt themselves to the world.
> >> >>>>>>> Unreasonable men adapt the world to themselves.
> >> >>>>>>> That's why all progress depends on unreasonable men."
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>
> >> >
> >>
> >>
> >
>


-- 
Filipe David Manana,
fdmanana@apache.org

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

--001485ed5748f65409048d74c14c--