incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J Chris Anderson <jch...@apache.org>
Subject Re: data recovery tool progress
Date Thu, 12 Aug 2010 21:33:10 GMT

On Aug 12, 2010, at 2:15 PM, J Chris Anderson wrote:

> 
> On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote:
> 
>> Right, and jchris' db_repair branch includes my patches for DB reader _admin access
and a more useful progress report in the replication phase of the repair.
>> 
> 
> I've updated the repair branch with everyone's code. I think it is faster, due to Adam's
idea that if we run the merges in reverse order, those near the front of the file are more
likely to be no-ops, so less work is done over all.
> 
> Mikeal will be testing for correctness. Could other's please use it and test for usability
as well. Latest code (with instructions) is here:
> 
> http://github.com/jhs/recover-couchdb/
> 
> Which points at http://github.com/jchris/couchdb/tree/db_repair for the repair code.
> 
> One thing I am not clear about (need better docs) is, do we need to replicate the original
db to the lost+found db (or vice-versa), after recovery is complete?
> 

Also, we should be clear about what the semantics for this are. It can potentially introduce
conflicts if some writes were repeated after restarts. Should it always be a noop on dbs that
are clean w/r/t the bug?

Chris

> Chris
> 
>> Adam
>> 
>> On Aug 12, 2010, at 3:14 PM, Jason Smith wrote:
>> 
>>> The code is updated with the following changes:
>>> 1. Adhere to the lost+found/databasename custom...
>>> 2. ...except databases starting with _, which goes into
>>> _system/databasename
>>> 3. Sync up with jchris's db_repair branch
>>> 
>>> (About #2, I started with _/database but I think it's too easy to miss at
>>> the command line.)
>>> 
>>> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <jchris@gmail.com> wrote:
>>> 
>>>> A few bug reports from my testing:
>>>> 
>>>> I launched with this command, as specified in the README:
>>>> 
>>>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec ./recover_couchdb
>>>> {} \;
>>>> 
>>>> 
>>>> 
>>>> First of all, it chokes on my _users and _replicator db:
>>>> 
>>>> [info] [<0.2.0>] couch_db_repair for _users - scanning 335961 bytes
at 0
>>>> [error] [<0.2.0>] couch_db_repair merge node at 332061 {case_clause,
>>>>                                   {error,illegal_database_name}}
>>>> 
>>>> That second [error] line is repeated many many times (once per merge I
>>>> think). I think the issue is that _users is hard-coded to be OK, but
>>>> _users_lost+found is not. So we should do something about that, maybe if
a
>>>> db-name starts with _ we should call the lost and found a_users_lost+found
>>>> (_ sorts at the top, so "a" will be near it and legal).
>>>> 
>>>> 
>>>> 
>>>> When a database has readers defined in the security object, the tool is
>>>> unable to open them (the reading part of the repair tool needs to have the
>>>> _admin userCtx, not just the writer).
>>>> 
>>>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined}
vs
>>>> Names [<<"joe">>] Roles [<<"_admin">>]
>>>> escript: exception throw: {unauthorized,<<"You are not authorized to
access
>>>> this db.">>}
>>>> in function  couch_db:open/2
>>>> in call from couch_db_repair:make_lost_and_found/3
>>>> in call from recover_couchdb:main/1
>>>> in call from escript:run/2
>>>> in call from escript:start/1
>>>> in call from init:start_it/1
>>>> in call from init:start_em/1
>>>> 
>>>> 
>>>> It would also be helpful if the status lines could say something more than
>>>> 
>>>> [info] [<0.2.0>] couch_db_repair writing 15 updates to bench_lost+found
>>>> 
>>>> Like maybe add a note like "about 23% complete" if at all possible.
>>>> 
>>>> 
>>>> I will patch the first few, I'd love help from someone on the last one.
>>>> I'll be on IRC.
>>>> 
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote:
>>>> 
>>>>> 
>>>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote:
>>>>> 
>>>>>> Hi, Jason.
>>>>>> 
>>>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <jhs@couch.io> wrote:
>>>>>> 
>>>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski <kocolosk@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>>> Excellent, thanks for testing.  I caught Jason Smith saying
on IRC
>>>> that he
>>>>>>>> had packaged the whole thing up as an escript + some .beams.
 If we
>>>> can get
>>>>>>>> it down to a single file a la rebar that would be a pretty
sweet way
>>>> to
>>>>>>>> deliver the repair tool in my opinion.
>>>>>>>> 
>>>>>>> 
>>>>>>> Please check out http://github.com/jhs/repair-couchdb
>>>>>>> 
>>>>>> 
>>>>>> I think you mean http://github.com/jhs/recover-couchdb
>>>>>> 
>>>>> 
>>>>> I think it is important that we package and release this, if it is ready.
>>>> We should link to it from the bug description page, the project home page,
>>>> as well as blog about it, etc. What is the point of working feverishly on
a
>>>> recovery tool if we don't go the last mile?
>>>>> 
>>>>> I am testing it now on my database directory to make sure it doesn't
harm
>>>> anything (I was never subject to the bug, which is probably where most
>>>> people are, but they might run it anyway.)
>>>>> 
>>>>> As it stands the submodules thing can't be part of the release, we need
>>>> to package it up as a single zip file or something.
>>>>> 
>>>>> Is there anything else that needs to be done before we can release this?
>>>>> 
>>>>> Chris
>>>>> 
>>>>>> --
>>>>>> Jason Smith
>>>>>> Couchio Hosting
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> Jason Smith
>>> Couchio Hosting
>> 
> 


Mime
View raw message