couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipe David Manana <fdman...@apache.org>
Subject Re: data recovery tool progress
Date Tue, 10 Aug 2010 09:46:12 GMT
Is it my impression or the forks I looked at (Volker, Adam, Randall) don't
use the changes I made to couch_file? They were needed to try reading terms
from random positions in the DB file, because if we try to read from a bad
position, the couch_file gen_server crashed and was never restarted (it's
not under a supervision tree).


On Tue, Aug 10, 2010 at 10:28 AM, Filipe David Manana
<fdmanana@apache.org>wrote:

>
>
> On Tue, Aug 10, 2010 at 9:55 AM, Robert Newson <robert.newson@gmail.com>wrote:
>
>> In ran the db_repair code on a healthy database produced with
>> delayed_commits=true.
>>
>> The source db had 3218 docs. db_repair recovered 3120 and then returned
>> with ok.
>>
>
> When a DB is repaired, couch_db_repair:repair/1 returns something matching
> {ok, repaired, _BTreeInfos}.
> If it returns only the atom 'ok' it means it did nothing to the DB file.
> At least in my original code, dunno if the forks changed that behaviour.
>
>
>>
>> I'm redoing that test, but this indicates we're not finding all roots.
>>
>> I note that the output file was 36 times the input file, which is a
>> consequence of folding all possible roots. I think that needs to be in
>> the release notes for the repair tool if that behavior remains when it
>> ships.
>>
>> B.
>>
>> On Tue, Aug 10, 2010 at 9:09 AM, Mikeal Rogers <mikeal.rogers@gmail.com>
>> wrote:
>> > I think I found a bug in the current lost+found repair.
>> >
>> > I've been running it against the testwritesdb and it's in a state that
>> is
>> > never finishing.
>> >
>> > It's still spitting out these lines:
>> >
>> > [info] [<0.32.0>] writing 1001 updates to lost+found/testwritesdb
>> >
>> > Most are 1001 but there are also other random variances 452, 866, etc.
>> >
>> > But the file size and dbinfo hasn't budged in over 30 minutes. The size
>> is
>> > stuck at 34300002 with the original db file being 54857478 .
>> >
>> > This database only has one document in it that isn't "lost" so if it's
>> > finding *any* new docs it should be writing them.
>> >
>> > I also started another job to recover a production db that is quite
>> large,
>> > 500megs, with the missing data a week or so back. This has been running
>> for
>> > 2 hours and has still not output anything or created the lost and found
>> db
>> > so I can only assume that it is in the same state.
>> >
>> > Both machines are still churning 100% CPU.
>> >
>> > -Mikeal
>> >
>> >
>> > On Mon, Aug 9, 2010 at 11:26 PM, Adam Kocoloski <kocolosk@apache.org>
>> wrote:
>> >
>> >> With Randall's help we hooked the new node scanner up to the lost+found
>> DB
>> >> generator.  It seems to work well enough for small DBs; for large DBs
>> with
>> >> lots of missing nodes the O(N^2) complexity of the problem catches up
>> to the
>> >> code and generating the lost+found DB takes quite some time.  Mikeal is
>> >> running tests tonight.  The algo appears pretty CPU-limited, so a
>> little
>> >> parallelization may be warranted.
>> >>
>> >> http://github.com/kocolosk/couchdb/tree/db_repair
>> >>
>> >> Adam
>> >>
>> >> (I sent this previous update to myself instead of the list, so I'll
>> forward
>> >> it here ...)
>> >>
>> >> On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote:
>> >>
>> >> > On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote:
>> >> >
>> >> >> Right, make_lost_and_found still relies on code which reads through
>> >> couch_file one byte at a time, that's the cause of the slowness.  The
>> newer
>> >> scanner will improve that pretty dramatically, and we can tune it
>> further by
>> >> increasing the length of the pattern that we match when looking for
>> >> kp/kv_node terms in the files, at the expense of some extra complexity
>> >> dealing with the block prefixes (currently it does a 1-byte match,
>> which as
>> >> I understand it cannot be split across blocks).
>> >> >
>> >> > The scanner now looks for a 7 byte match, unless it is within 6 bytes
>> of
>> >> a block boundary, in which case it looks for the longest possible match
>> at
>> >> that position.  The more specific match condition greatly reduces the #
>> of
>> >> calls to couch_file, and thus boosts the throughput.  On my laptop it
>> can
>> >> scan the testwritesdb.couch from Mikeal's couchtest repo (52 MB) in 18
>> >> seconds.
>> >> >
>> >> >> Regarding the file_corruption error on the larger file, I think
this
>> is
>> >> something we will just naturally trigger when we take a guess that
>> random
>> >> positions in a file are actually the beginning of a term.  I think our
>> best
>> >> recourse here is to return {error, file_corruption} from couch_file but
>> >> leave the gen_server up and running instead of terminating it.  That
>> way the
>> >> repair code can ignore the error and keep moving without having to
>> reopen
>> >> the file.
>> >> >
>> >> > I committed this change (to my db_repair branch) after consulting
>> with
>> >> Chris.  The longer match condition makes these spurious file_corruption
>> >> triggers much less likely, but I think it's still a good thing not to
>> crash
>> >> the server when they happen.
>> >> >
>> >> >> Next steps as I understand them - Randall is working on integrating
>> the
>> >> in-memory scanner into Volker's code that finds all the dangling by_id
>> >> nodes.  I'm working on making sure that the scanner identifies bt node
>> >> candidates which span block prefixes, and on improving its
>> pattern-matching.
>> >> >
>> >> > Latest from my end
>> >> > http://github.com/kocolosk/couchdb/tree/db_repair
>> >> >
>> >> >>
>> >> >> Adam
>> >> >>
>> >> >> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote:
>> >> >>
>> >> >>> I pulled down the latest code from Adam's branch @
>> >> >>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b.
>> >> >>>
>> >> >>> Running timer:tc(couch_db_repair, make_lost_and_found,
>> >> ["multi_conflict"]).
>> >> >>> on a database with 200 lost updates spanning 200 restarts (
>> >> >>>
>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch )
>> >> took
>> >> >>> about 101 seconds.
>> >> >>>
>> >> >>> I tried running against a larger databases (
>> >> >>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch)
>> >> and I
>> >> >>> got this exception:
>> >> >>>
>> >> >>> http://gist.github.com/516491
>> >> >>>
>> >> >>> -Mikeal
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds <
>> randall.leeds@gmail.com
>> >> >wrote:
>> >> >>>
>> >> >>>> Summing up what went on in IRC for those who were absent.
>> >> >>>>
>> >> >>>> The latest progress is on Adam's branch at
>> >> >>>> http://github.com/kocolosk/couchdb/tree/db_repair
>> >> >>>>
>> >> >>>> couch_db_repair:make_lost_and_found/1 attempts to create
a new
>> >> >>>> lost+found/DbName database to which it merges all nodes
not
>> accessible
>> >> >>>> from anywhere (any other node found in a full file scan
or any
>> header
>> >> >>>> pointers).
>> >> >>>>
>> >> >>>> Currently, make_lost_and_found uses Volker's repair (from
>> >> >>>> couch_db_repair_b module, also in Adam's branch).
>> >> >>>> Adam found that the bottleneck was couch_file calls and
that the
>> >> >>>> repair process was taking a very long time so he added
>> >> >>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks
as
>> binary
>> >> >>>> and tries to process it to find nodes instead of scanning
back one
>> >> >>>> byte at a time. It is currently not hooked up to the repair
>> mechanism.
>> >> >>>>
>> >> >>>> Making progress. Go team.
>> >> >>>>
>> >> >>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers <
>> mikeal.rogers@gmail.com>
>> >> >>>> wrote:
>> >> >>>>> jchris suggested on IRC that I try a normal doc update
and see if
>> >> that
>> >> >>>> fixes
>> >> >>>>> it.
>> >> >>>>>
>> >> >>>>> It does. After a new doc was created the dbinfo doc
count was
>> back to
>> >> >>>>> normal.
>> >> >>>>>
>> >> >>>>> -Mikeal
>> >> >>>>>
>> >> >>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers <
>> >> mikeal.rogers@gmail.com
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>>> Ok, I pulled down this code and tested against
a database with a
>> ton
>> >> of
>> >> >>>>>> missing writes right before a single restart.
>> >> >>>>>>
>> >> >>>>>> Before restart this was the database:
>> >> >>>>>>
>> >> >>>>>> {
>> >> >>>>>> db_name: "testwritesdb"
>> >> >>>>>> doc_count: 124969
>> >> >>>>>> doc_del_count: 0
>> >> >>>>>> update_seq: 124969
>> >> >>>>>> purge_seq: 0
>> >> >>>>>> compact_running: false
>> >> >>>>>> disk_size: 54857478
>> >> >>>>>> instance_start_time: "1281384140058211"
>> >> >>>>>> disk_format_version: 5
>> >> >>>>>> }
>> >> >>>>>>
>> >> >>>>>> After restart it was this:
>> >> >>>>>>
>> >> >>>>>> {
>> >> >>>>>> db_name: "testwritesdb"
>> >> >>>>>> doc_count: 1
>> >> >>>>>> doc_del_count: 0
>> >> >>>>>> update_seq: 1
>> >> >>>>>> purge_seq: 0
>> >> >>>>>> compact_running: false
>> >> >>>>>> disk_size: 54857478
>> >> >>>>>> instance_start_time: "1281384593876026"
>> >> >>>>>> disk_format_version: 5
>> >> >>>>>> }
>> >> >>>>>>
>> >> >>>>>> After repair, it's this:
>> >> >>>>>>
>> >> >>>>>> {
>> >> >>>>>> db_name: "testwritesdb"
>> >> >>>>>> doc_count: 1
>> >> >>>>>> doc_del_count: 0
>> >> >>>>>> update_seq: 124969
>> >> >>>>>> purge_seq: 0
>> >> >>>>>> compact_running: false
>> >> >>>>>> disk_size: 54857820
>> >> >>>>>> instance_start_time: "1281385990193289"
>> >> >>>>>> disk_format_version: 5
>> >> >>>>>> committed_update_seq: 124969
>> >> >>>>>> }
>> >> >>>>>>
>> >> >>>>>> All the sequences are there and hitting _all_docs
shows all the
>> >> >>>> documents
>> >> >>>>>> so why is the doc_count only 1 in the dbinfo?
>> >> >>>>>>
>> >> >>>>>> -Mikeal
>> >> >>>>>>
>> >> >>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana
<
>> >> >>>> fdmanana@apache.org>wrote:
>> >> >>>>>>
>> >> >>>>>>> For the record (and people not on IRC), the
code at:
>> >> >>>>>>>
>> >> >>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair
>> >> >>>>>>>
>> >> >>>>>>> is working for at least simple cases. Use
>> >> >>>>>>> couch_db_repair:repair(DbNameAsString).
>> >> >>>>>>> There's one TODO:  update the reduce values
for the by_seq and
>> >> by_id
>> >> >>>>>>> BTrees.
>> >> >>>>>>>
>> >> >>>>>>> If anyone wants to give some help on this,
your welcome.
>> >> >>>>>>>
>> >> >>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers
<
>> >> mikeal.rogers@gmail.com
>> >> >>>>>>>> wrote:
>> >> >>>>>>>
>> >> >>>>>>>> I'm starting to create a bunch of test
db files that expose
>> this
>> >> bug
>> >> >>>>>>> under
>> >> >>>>>>>> different conditions like multiple restarts,
across
>> compaction,
>> >> >>>>>>> variances
>> >> >>>>>>>> in
>> >> >>>>>>>> updates the might cause conflict, etc.
>> >> >>>>>>>>
>> >> >>>>>>>> http://github.com/mikeal/couchtest
>> >> >>>>>>>>
>> >> >>>>>>>> The README outlines what was done to the
db's and what needs
>> to be
>> >> >>>>>>>> recovered.
>> >> >>>>>>>>
>> >> >>>>>>>> -Mikeal
>> >> >>>>>>>>
>> >> >>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe
David Manana <
>> >> >>>>>>> fdmanana@apache.org
>> >> >>>>>>>>> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert
Newson <
>> >> >>>>>>> robert.newson@gmail.com
>> >> >>>>>>>>>> wrote:
>> >> >>>>>>>>>
>> >> >>>>>>>>>> Doesn't this bit;
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> -        Db#db{waiting_delayed_commit=nil};
>> >> >>>>>>>>>> +        Db;
>> >> >>>>>>>>>> +        % Db#db{waiting_delayed_commit=nil};
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> revert the bug fix?
>> >> >>>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> That's intentional, for my local testing.
>> >> >>>>>>>>> That patch isn't obviously anything
close to final, it's too
>> >> >>>>>>> experimental
>> >> >>>>>>>>> yet.
>> >> >>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> B.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM,
Jan Lehnardt <
>> jan@apache.org>
>> >> >>>>>>> wrote:
>> >> >>>>>>>>>>> Hi All,
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Filipe jumped in to start working
on the recovery tool, but
>> he
>> >> >>>>>>> isn't
>> >> >>>>>>>>> done
>> >> >>>>>>>>>> yet.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Here's the current patch:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> it is not done and very early,
but any help on this is
>> greatly
>> >> >>>>>>>>>> appreciated.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> The current state is (in Filipe's
words):
>> >> >>>>>>>>>>> - i can detect that a file
needs repair
>> >> >>>>>>>>>>> - and get the last btree roots
from it
>> >> >>>>>>>>>>> - "only" missing: get last
db seq num
>> >> >>>>>>>>>>> - write new header
>> >> >>>>>>>>>>> - and deal with the local docs
btree (if exists)
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Thanks!
>> >> >>>>>>>>>>> Jan
>> >> >>>>>>>>>>> --
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> --
>> >> >>>>>>>>> Filipe David Manana,
>> >> >>>>>>>>> fdmanana@apache.org
>> >> >>>>>>>>>
>> >> >>>>>>>>> "Reasonable men adapt themselves to
the world.
>> >> >>>>>>>>> Unreasonable men adapt the world to
themselves.
>> >> >>>>>>>>> That's why all progress depends on
unreasonable men."
>> >> >>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> --
>> >> >>>>>>> Filipe David Manana,
>> >> >>>>>>> fdmanana@apache.org
>> >> >>>>>>>
>> >> >>>>>>> "Reasonable men adapt themselves to the world.
>> >> >>>>>>> Unreasonable men adapt the world to themselves.
>> >> >>>>>>> That's why all progress depends on unreasonable
men."
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>
>> >> >
>> >>
>> >>
>> >
>>
>
>
>
> --
> Filipe David Manana,
> fdmanana@apache.org
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
>
>


-- 
Filipe David Manana,
fdmanana@apache.org

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message