incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikeal Rogers <mikeal.rog...@gmail.com>
Subject Re: data recovery tool progress
Date Tue, 10 Aug 2010 20:34:05 GMT
I have some timing number for the new code.

multi_conflict has 200 lost documents and 201 documents total after
recovery.
1> timer:tc(couch_db_repair, make_lost_and_found, ["multi_conflict"]).
{25217069,ok}
25 seconds

Something funky is going on here. Investigating.
1> timer:tc(couch_db_repair, make_lost_and_found,
["multi_conflict_with_attach"]).
{654782,ok}
.6 seconds

This db has 124969 documents in it.
1> timer:tc(couch_db_repair, make_lost_and_found, ["testwritesdb"]).
{1381969304,ok}
23 minutes

This database is about 500megs and 46660 before recovery and 46801 after.
1> timer:tc(couch_db_repair, make_lost_and_found, ["prod"]).
{2329669113,ok}
38.8 minutes

-Mikeal

On Tue, Aug 10, 2010 at 12:06 PM, Adam Kocoloski <kocolosk@apache.org>wrote:

> Good idea.  Now we've got
>
> > [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 1048576
> bytes at 1380102
> > [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 1048576
> bytes at 331526
> > [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 331526
> bytes at 0
> > [info] [<0.33.0>] couch_db_repair writing 12 updates to
> lost+found/testwritesdb
> > [info] [<0.33.0>] couch_db_repair writing 9 updates to
> lost+found/testwritesdb
> > [info] [<0.33.0>] couch_db_repair writing 8 updates to
> lost+found/testwritesdb
>
> Adam
>
> On Aug 10, 2010, at 2:29 PM, Robert Newson wrote:
>
> > It took 20 minutes before the first 'update' line came out, but now
> > seems to be recovering smoothly. machine load is back down to sane
> > levels.
> >
> > Suggest feedback during the hunting phase.
> >
> > B.
> >
> > On Tue, Aug 10, 2010 at 7:11 PM, Adam Kocoloski <kocolosk@apache.org>
> wrote:
> >> Thanks for the crosscheck.  I'm not aware of anything in the node finder
> that would cause it to struggle mightily with healthy DBs.  It pretty much
> ignores the health of the DB, in fact.  Would be interested to hear more.
> >>
> >> On Aug 10, 2010, at 1:59 PM, Robert Newson wrote:
> >>
> >>> I verified the new code's ability to repair the testwritesdb. system
> >>> load was smooth from start to finish.
> >>>
> >>> I started a further test on a different (healthy) database and system
> >>> load was severe again, just collecting the roots (the lost+found db
> >>> was not yet created when I aborted the attempt). I suspect the fact
> >>> that it's healthy is the issue, so if I'm right, perhaps a warning is
> >>> useful.
> >>>
> >>> B.
> >>>
> >>>
> >>>
> >>> On Tue, Aug 10, 2010 at 6:53 PM, Adam Kocoloski <kocolosk@apache.org>
> wrote:
> >>>> Another update.  This morning I took a different tack and, rather than
> try to find root nodes, I just looked for all kv_nodes in the file and
> treated each of those as a separate virtual DB to be replicated.  This
> reduces the algorithmic complexity of the repair, and it looks like
> testwritesdb repairs in ~30 minutes or so.  Also, this method results in the
> lost+found DB containing every document, not just the missing ones.
> >>>>
> >>>> My branch does not currently include Randall's parallelization of the
> replications.  It's still CPU-limited, so that may be a worthwhile
> optimization.  On the other hand, I think we may be reaching a stage at
> which performance for this repair tool is 'good enough', and pmaps can make
> error handling a bit dicey.
> >>>>
> >>>> In short, I think this tool is now in good shape.
> >>>>
> >>>> http://github.com/kocolosk/couchdb/tree/db_repair
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message