incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: data recovery tool progress
Date Tue, 10 Aug 2010 10:01:16 GMT
Filipe,
I'm not sure which changes you're talking about exactly, but I know
Adam and I decided to use the old gen_server:call({pread that can read
arbitrary positions as binaries. The reason for this is so that the
scanner can read large chunks in one call and then analyze that for
node terms.

On Tue, Aug 10, 2010 at 02:46, Filipe David Manana <fdmanana@apache.org> wrote:
> Is it my impression or the forks I looked at (Volker, Adam, Randall) don't
> use the changes I made to couch_file? They were needed to try reading terms
> from random positions in the DB file, because if we try to read from a bad
> position, the couch_file gen_server crashed and was never restarted (it's
> not under a supervision tree).
>
>
> On Tue, Aug 10, 2010 at 10:28 AM, Filipe David Manana
> <fdmanana@apache.org>wrote:
>
>>
>>
>> On Tue, Aug 10, 2010 at 9:55 AM, Robert Newson <robert.newson@gmail.com>wrote:
>>
>>> In ran the db_repair code on a healthy database produced with
>>> delayed_commits=true.
>>>
>>> The source db had 3218 docs. db_repair recovered 3120 and then returned
>>> with ok.
>>>
>>
>> When a DB is repaired, couch_db_repair:repair/1 returns something matching
>> {ok, repaired, _BTreeInfos}.
>> If it returns only the atom 'ok' it means it did nothing to the DB file.
>> At least in my original code, dunno if the forks changed that behaviour.
>>
>>
>>>
>>> I'm redoing that test, but this indicates we're not finding all roots.
>>>
>>> I note that the output file was 36 times the input file, which is a
>>> consequence of folding all possible roots. I think that needs to be in
>>> the release notes for the repair tool if that behavior remains when it
>>> ships.
>>>
>>> B.
>>>
>>> On Tue, Aug 10, 2010 at 9:09 AM, Mikeal Rogers <mikeal.rogers@gmail.com>
>>> wrote:
>>> > I think I found a bug in the current lost+found repair.
>>> >
>>> > I've been running it against the testwritesdb and it's in a state that
>>> is
>>> > never finishing.
>>> >
>>> > It's still spitting out these lines:
>>> >
>>> > [info] [<0.32.0>] writing 1001 updates to lost+found/testwritesdb
>>> >
>>> > Most are 1001 but there are also other random variances 452, 866, etc.
>>> >
>>> > But the file size and dbinfo hasn't budged in over 30 minutes. The size
>>> is
>>> > stuck at 34300002 with the original db file being 54857478 .
>>> >
>>> > This database only has one document in it that isn't "lost" so if it's
>>> > finding *any* new docs it should be writing them.
>>> >
>>> > I also started another job to recover a production db that is quite
>>> large,
>>> > 500megs, with the missing data a week or so back. This has been running
>>> for
>>> > 2 hours and has still not output anything or created the lost and found
>>> db
>>> > so I can only assume that it is in the same state.
>>> >
>>> > Both machines are still churning 100% CPU.
>>> >
>>> > -Mikeal
>>> >
>>> >
>>> > On Mon, Aug 9, 2010 at 11:26 PM, Adam Kocoloski <kocolosk@apache.org>
>>> wrote:
>>> >
>>> >> With Randall's help we hooked the new node scanner up to the lost+found
>>> DB
>>> >> generator.  It seems to work well enough for small DBs; for large DBs
>>> with
>>> >> lots of missing nodes the O(N^2) complexity of the problem catches up
>>> to the
>>> >> code and generating the lost+found DB takes quite some time.  Mikeal
is
>>> >> running tests tonight.  The algo appears pretty CPU-limited, so a
>>> little
>>> >> parallelization may be warranted.
>>> >>
>>> >> http://github.com/kocolosk/couchdb/tree/db_repair
>>> >>
>>> >> Adam
>>> >>
>>> >> (I sent this previous update to myself instead of the list, so I'll
>>> forward
>>> >> it here ...)
>>> >>
>>> >> On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote:
>>> >>
>>> >> > On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote:
>>> >> >
>>> >> >> Right, make_lost_and_found still relies on code which reads
through
>>> >> couch_file one byte at a time, that's the cause of the slowness.  The
>>> newer
>>> >> scanner will improve that pretty dramatically, and we can tune it
>>> further by
>>> >> increasing the length of the pattern that we match when looking for
>>> >> kp/kv_node terms in the files, at the expense of some extra complexity
>>> >> dealing with the block prefixes (currently it does a 1-byte match,
>>> which as
>>> >> I understand it cannot be split across blocks).
>>> >> >
>>> >> > The scanner now looks for a 7 byte match, unless it is within 6
bytes
>>> of
>>> >> a block boundary, in which case it looks for the longest possible match
>>> at
>>> >> that position.  The more specific match condition greatly reduces the
#
>>> of
>>> >> calls to couch_file, and thus boosts the throughput.  On my laptop
it
>>> can
>>> >> scan the testwritesdb.couch from Mikeal's couchtest repo (52 MB) in
18
>>> >> seconds.
>>> >> >
>>> >> >> Regarding the file_corruption error on the larger file, I think
this
>>> is
>>> >> something we will just naturally trigger when we take a guess that
>>> random
>>> >> positions in a file are actually the beginning of a term.  I think
our
>>> best
>>> >> recourse here is to return {error, file_corruption} from couch_file
but
>>> >> leave the gen_server up and running instead of terminating it.  That
>>> way the
>>> >> repair code can ignore the error and keep moving without having to
>>> reopen
>>> >> the file.
>>> >> >
>>> >> > I committed this change (to my db_repair branch) after consulting
>>> with
>>> >> Chris.  The longer match condition makes these spurious file_corruption
>>> >> triggers much less likely, but I think it's still a good thing not to
>>> crash
>>> >> the server when they happen.
>>> >> >
>>> >> >> Next steps as I understand them - Randall is working on integrating
>>> the
>>> >> in-memory scanner into Volker's code that finds all the dangling by_id
>>> >> nodes.  I'm working on making sure that the scanner identifies bt node
>>> >> candidates which span block prefixes, and on improving its
>>> pattern-matching.
>>> >> >
>>> >> > Latest from my end
>>> >> > http://github.com/kocolosk/couchdb/tree/db_repair
>>> >> >
>>> >> >>
>>> >> >> Adam
>>> >> >>
>>> >> >> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote:
>>> >> >>
>>> >> >>> I pulled down the latest code from Adam's branch @
>>> >> >>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b.
>>> >> >>>
>>> >> >>> Running timer:tc(couch_db_repair, make_lost_and_found,
>>> >> ["multi_conflict"]).
>>> >> >>> on a database with 200 lost updates spanning 200 restarts
(
>>> >> >>>
>>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch )
>>> >> took
>>> >> >>> about 101 seconds.
>>> >> >>>
>>> >> >>> I tried running against a larger databases (
>>> >> >>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch)
>>> >> and I
>>> >> >>> got this exception:
>>> >> >>>
>>> >> >>> http://gist.github.com/516491
>>> >> >>>
>>> >> >>> -Mikeal
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds <
>>> randall.leeds@gmail.com
>>> >> >wrote:
>>> >> >>>
>>> >> >>>> Summing up what went on in IRC for those who were absent.
>>> >> >>>>
>>> >> >>>> The latest progress is on Adam's branch at
>>> >> >>>> http://github.com/kocolosk/couchdb/tree/db_repair
>>> >> >>>>
>>> >> >>>> couch_db_repair:make_lost_and_found/1 attempts to create
a new
>>> >> >>>> lost+found/DbName database to which it merges all nodes
not
>>> accessible
>>> >> >>>> from anywhere (any other node found in a full file
scan or any
>>> header
>>> >> >>>> pointers).
>>> >> >>>>
>>> >> >>>> Currently, make_lost_and_found uses Volker's repair
(from
>>> >> >>>> couch_db_repair_b module, also in Adam's branch).
>>> >> >>>> Adam found that the bottleneck was couch_file calls
and that the
>>> >> >>>> repair process was taking a very long time so he added
>>> >> >>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB
chunks as
>>> binary
>>> >> >>>> and tries to process it to find nodes instead of scanning
back one
>>> >> >>>> byte at a time. It is currently not hooked up to the
repair
>>> mechanism.
>>> >> >>>>
>>> >> >>>> Making progress. Go team.
>>> >> >>>>
>>> >> >>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers <
>>> mikeal.rogers@gmail.com>
>>> >> >>>> wrote:
>>> >> >>>>> jchris suggested on IRC that I try a normal doc
update and see if
>>> >> that
>>> >> >>>> fixes
>>> >> >>>>> it.
>>> >> >>>>>
>>> >> >>>>> It does. After a new doc was created the dbinfo
doc count was
>>> back to
>>> >> >>>>> normal.
>>> >> >>>>>
>>> >> >>>>> -Mikeal
>>> >> >>>>>
>>> >> >>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers <
>>> >> mikeal.rogers@gmail.com
>>> >> >>>>> wrote:
>>> >> >>>>>
>>> >> >>>>>> Ok, I pulled down this code and tested against
a database with a
>>> ton
>>> >> of
>>> >> >>>>>> missing writes right before a single restart.
>>> >> >>>>>>
>>> >> >>>>>> Before restart this was the database:
>>> >> >>>>>>
>>> >> >>>>>> {
>>> >> >>>>>> db_name: "testwritesdb"
>>> >> >>>>>> doc_count: 124969
>>> >> >>>>>> doc_del_count: 0
>>> >> >>>>>> update_seq: 124969
>>> >> >>>>>> purge_seq: 0
>>> >> >>>>>> compact_running: false
>>> >> >>>>>> disk_size: 54857478
>>> >> >>>>>> instance_start_time: "1281384140058211"
>>> >> >>>>>> disk_format_version: 5
>>> >> >>>>>> }
>>> >> >>>>>>
>>> >> >>>>>> After restart it was this:
>>> >> >>>>>>
>>> >> >>>>>> {
>>> >> >>>>>> db_name: "testwritesdb"
>>> >> >>>>>> doc_count: 1
>>> >> >>>>>> doc_del_count: 0
>>> >> >>>>>> update_seq: 1
>>> >> >>>>>> purge_seq: 0
>>> >> >>>>>> compact_running: false
>>> >> >>>>>> disk_size: 54857478
>>> >> >>>>>> instance_start_time: "1281384593876026"
>>> >> >>>>>> disk_format_version: 5
>>> >> >>>>>> }
>>> >> >>>>>>
>>> >> >>>>>> After repair, it's this:
>>> >> >>>>>>
>>> >> >>>>>> {
>>> >> >>>>>> db_name: "testwritesdb"
>>> >> >>>>>> doc_count: 1
>>> >> >>>>>> doc_del_count: 0
>>> >> >>>>>> update_seq: 124969
>>> >> >>>>>> purge_seq: 0
>>> >> >>>>>> compact_running: false
>>> >> >>>>>> disk_size: 54857820
>>> >> >>>>>> instance_start_time: "1281385990193289"
>>> >> >>>>>> disk_format_version: 5
>>> >> >>>>>> committed_update_seq: 124969
>>> >> >>>>>> }
>>> >> >>>>>>
>>> >> >>>>>> All the sequences are there and hitting _all_docs
shows all the
>>> >> >>>> documents
>>> >> >>>>>> so why is the doc_count only 1 in the dbinfo?
>>> >> >>>>>>
>>> >> >>>>>> -Mikeal
>>> >> >>>>>>
>>> >> >>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David
Manana <
>>> >> >>>> fdmanana@apache.org>wrote:
>>> >> >>>>>>
>>> >> >>>>>>> For the record (and people not on IRC),
the code at:
>>> >> >>>>>>>
>>> >> >>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair
>>> >> >>>>>>>
>>> >> >>>>>>> is working for at least simple cases. Use
>>> >> >>>>>>> couch_db_repair:repair(DbNameAsString).
>>> >> >>>>>>> There's one TODO:  update the reduce values
for the by_seq and
>>> >> by_id
>>> >> >>>>>>> BTrees.
>>> >> >>>>>>>
>>> >> >>>>>>> If anyone wants to give some help on this,
your welcome.
>>> >> >>>>>>>
>>> >> >>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal
Rogers <
>>> >> mikeal.rogers@gmail.com
>>> >> >>>>>>>> wrote:
>>> >> >>>>>>>
>>> >> >>>>>>>> I'm starting to create a bunch of test
db files that expose
>>> this
>>> >> bug
>>> >> >>>>>>> under
>>> >> >>>>>>>> different conditions like multiple
restarts, across
>>> compaction,
>>> >> >>>>>>> variances
>>> >> >>>>>>>> in
>>> >> >>>>>>>> updates the might cause conflict, etc.
>>> >> >>>>>>>>
>>> >> >>>>>>>> http://github.com/mikeal/couchtest
>>> >> >>>>>>>>
>>> >> >>>>>>>> The README outlines what was done to
the db's and what needs
>>> to be
>>> >> >>>>>>>> recovered.
>>> >> >>>>>>>>
>>> >> >>>>>>>> -Mikeal
>>> >> >>>>>>>>
>>> >> >>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe
David Manana <
>>> >> >>>>>>> fdmanana@apache.org
>>> >> >>>>>>>>> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM,
Robert Newson <
>>> >> >>>>>>> robert.newson@gmail.com
>>> >> >>>>>>>>>> wrote:
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>> Doesn't this bit;
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> -        Db#db{waiting_delayed_commit=nil};
>>> >> >>>>>>>>>> +        Db;
>>> >> >>>>>>>>>> +        % Db#db{waiting_delayed_commit=nil};
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> revert the bug fix?
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> That's intentional, for my local
testing.
>>> >> >>>>>>>>> That patch isn't obviously anything
close to final, it's too
>>> >> >>>>>>> experimental
>>> >> >>>>>>>>> yet.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> B.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> On Mon, Aug 9, 2010 at 5:09
PM, Jan Lehnardt <
>>> jan@apache.org>
>>> >> >>>>>>> wrote:
>>> >> >>>>>>>>>>> Hi All,
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Filipe jumped in to start
working on the recovery tool, but
>>> he
>>> >> >>>>>>> isn't
>>> >> >>>>>>>>> done
>>> >> >>>>>>>>>> yet.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Here's the current patch:
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> it is not done and very
early, but any help on this is
>>> greatly
>>> >> >>>>>>>>>> appreciated.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> The current state is (in
Filipe's words):
>>> >> >>>>>>>>>>> - i can detect that a file
needs repair
>>> >> >>>>>>>>>>> - and get the last btree
roots from it
>>> >> >>>>>>>>>>> - "only" missing: get last
db seq num
>>> >> >>>>>>>>>>> - write new header
>>> >> >>>>>>>>>>> - and deal with the local
docs btree (if exists)
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Thanks!
>>> >> >>>>>>>>>>> Jan
>>> >> >>>>>>>>>>> --
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> --
>>> >> >>>>>>>>> Filipe David Manana,
>>> >> >>>>>>>>> fdmanana@apache.org
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> "Reasonable men adapt themselves
to the world.
>>> >> >>>>>>>>> Unreasonable men adapt the world
to themselves.
>>> >> >>>>>>>>> That's why all progress depends
on unreasonable men."
>>> >> >>>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> Filipe David Manana,
>>> >> >>>>>>> fdmanana@apache.org
>>> >> >>>>>>>
>>> >> >>>>>>> "Reasonable men adapt themselves to the
world.
>>> >> >>>>>>> Unreasonable men adapt the world to themselves.
>>> >> >>>>>>> That's why all progress depends on unreasonable
men."
>>> >> >>>>>>>
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>>
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >
>>>
>>
>>
>>
>> --
>> Filipe David Manana,
>> fdmanana@apache.org
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>>
>>
>
>
> --
> Filipe David Manana,
> fdmanana@apache.org
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
>

Mime
View raw message