couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J Chris Anderson <jch...@apache.org>
Subject Re: data recovery tool progress
Date Fri, 13 Aug 2010 14:05:08 GMT

On Aug 12, 2010, at 11:38 PM, Mikeal Rogers wrote:

> I tested the latest code in recover-couchdb and it looks great.

We need to package this so that it is useable by end-users, and put a link to it on http://couchdb.apache.org/notice/1.0.1.html

I'm the last guy who knows what that would mean... anyone? I think we should do this today.

Do we need to do anything formal and time consuming before linking to the recovery tool /
process from that page?

Also, someone needs to write up the how-to instructions, along with a description of what
to expect.

Chris

> 
> -Mikeal
> 
> On Thu, Aug 12, 2010 at 2:33 PM, J Chris Anderson <jchris@apache.org> wrote:
> 
>> 
>> On Aug 12, 2010, at 2:15 PM, J Chris Anderson wrote:
>> 
>>> 
>>> On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote:
>>> 
>>>> Right, and jchris' db_repair branch includes my patches for DB reader
>> _admin access and a more useful progress report in the replication phase of
>> the repair.
>>>> 
>>> 
>>> I've updated the repair branch with everyone's code. I think it is
>> faster, due to Adam's idea that if we run the merges in reverse order, those
>> near the front of the file are more likely to be no-ops, so less work is
>> done over all.
>>> 
>>> Mikeal will be testing for correctness. Could other's please use it and
>> test for usability as well. Latest code (with instructions) is here:
>>> 
>>> http://github.com/jhs/recover-couchdb/
>>> 
>>> Which points at http://github.com/jchris/couchdb/tree/db_repair for the
>> repair code.
>>> 
>>> One thing I am not clear about (need better docs) is, do we need to
>> replicate the original db to the lost+found db (or vice-versa), after
>> recovery is complete?
>>> 
>> 
>> Also, we should be clear about what the semantics for this are. It can
>> potentially introduce conflicts if some writes were repeated after restarts.
>> Should it always be a noop on dbs that are clean w/r/t the bug?
>> 
>> Chris
>> 
>>> Chris
>>> 
>>>> Adam
>>>> 
>>>> On Aug 12, 2010, at 3:14 PM, Jason Smith wrote:
>>>> 
>>>>> The code is updated with the following changes:
>>>>> 1. Adhere to the lost+found/databasename custom...
>>>>> 2. ...except databases starting with _, which goes into
>>>>> _system/databasename
>>>>> 3. Sync up with jchris's db_repair branch
>>>>> 
>>>>> (About #2, I started with _/database but I think it's too easy to miss
>> at
>>>>> the command line.)
>>>>> 
>>>>> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <jchris@gmail.com>
>> wrote:
>>>>> 
>>>>>> A few bug reports from my testing:
>>>>>> 
>>>>>> I launched with this command, as specified in the README:
>>>>>> 
>>>>>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec
>> ./recover_couchdb
>>>>>> {} \;
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> First of all, it chokes on my _users and _replicator db:
>>>>>> 
>>>>>> [info] [<0.2.0>] couch_db_repair for _users - scanning 335961
bytes at
>> 0
>>>>>> [error] [<0.2.0>] couch_db_repair merge node at 332061 {case_clause,
>>>>>>                                  {error,illegal_database_name}}
>>>>>> 
>>>>>> That second [error] line is repeated many many times (once per merge
I
>>>>>> think). I think the issue is that _users is hard-coded to be OK,
but
>>>>>> _users_lost+found is not. So we should do something about that, maybe
>> if a
>>>>>> db-name starts with _ we should call the lost and found
>> a_users_lost+found
>>>>>> (_ sorts at the top, so "a" will be near it and legal).
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> When a database has readers defined in the security object, the tool
>> is
>>>>>> unable to open them (the reading part of the repair tool needs to
have
>> the
>>>>>> _admin userCtx, not just the writer).
>>>>>> 
>>>>>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined}
>> vs
>>>>>> Names [<<"joe">>] Roles [<<"_admin">>]
>>>>>> escript: exception throw: {unauthorized,<<"You are not authorized
to
>> access
>>>>>> this db.">>}
>>>>>> in function  couch_db:open/2
>>>>>> in call from couch_db_repair:make_lost_and_found/3
>>>>>> in call from recover_couchdb:main/1
>>>>>> in call from escript:run/2
>>>>>> in call from escript:start/1
>>>>>> in call from init:start_it/1
>>>>>> in call from init:start_em/1
>>>>>> 
>>>>>> 
>>>>>> It would also be helpful if the status lines could say something
more
>> than
>>>>>> 
>>>>>> [info] [<0.2.0>] couch_db_repair writing 15 updates to
>> bench_lost+found
>>>>>> 
>>>>>> Like maybe add a note like "about 23% complete" if at all possible.
>>>>>> 
>>>>>> 
>>>>>> I will patch the first few, I'd love help from someone on the last
>> one.
>>>>>> I'll be on IRC.
>>>>>> 
>>>>>> 
>>>>>> Cheers,
>>>>>> Chris
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote:
>>>>>>> 
>>>>>>>> Hi, Jason.
>>>>>>>> 
>>>>>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <jhs@couch.io>
wrote:
>>>>>>>> 
>>>>>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski <kocolosk@apache.org
>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Excellent, thanks for testing.  I caught Jason Smith
saying on IRC
>>>>>> that he
>>>>>>>>>> had packaged the whole thing up as an escript + some
.beams.  If
>> we
>>>>>> can get
>>>>>>>>>> it down to a single file a la rebar that would be
a pretty sweet
>> way
>>>>>> to
>>>>>>>>>> deliver the repair tool in my opinion.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Please check out http://github.com/jhs/repair-couchdb
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> I think you mean http://github.com/jhs/recover-couchdb
>>>>>>>> 
>>>>>>> 
>>>>>>> I think it is important that we package and release this, if
it is
>> ready.
>>>>>> We should link to it from the bug description page, the project home
>> page,
>>>>>> as well as blog about it, etc. What is the point of working feverishly
>> on a
>>>>>> recovery tool if we don't go the last mile?
>>>>>>> 
>>>>>>> I am testing it now on my database directory to make sure it
doesn't
>> harm
>>>>>> anything (I was never subject to the bug, which is probably where
most
>>>>>> people are, but they might run it anyway.)
>>>>>>> 
>>>>>>> As it stands the submodules thing can't be part of the release,
we
>> need
>>>>>> to package it up as a single zip file or something.
>>>>>>> 
>>>>>>> Is there anything else that needs to be done before we can release
>> this?
>>>>>>> 
>>>>>>> Chris
>>>>>>> 
>>>>>>>> --
>>>>>>>> Jason Smith
>>>>>>>> Couchio Hosting
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Jason Smith
>>>>> Couchio Hosting
>>>> 
>>> 
>> 
>> 


Mime
View raw message