couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J Chris Anderson <jch...@apache.org>
Subject Re: data recovery tool progress
Date Fri, 13 Aug 2010 16:26:15 GMT
Here is my first pass at notes on the repair tool.

I'd like to get this on the Apache website today so we can publicize it:

http://wiki.couchone.com/page/repair-tool

Please read, test, edit, give feedback, etc.

Thanks,
Chris

On Aug 13, 2010, at 7:05 AM, J Chris Anderson wrote:

> 
> On Aug 12, 2010, at 11:38 PM, Mikeal Rogers wrote:
> 
>> I tested the latest code in recover-couchdb and it looks great.
> 
> We need to package this so that it is useable by end-users, and put a link to it on http://couchdb.apache.org/notice/1.0.1.html
> 
> I'm the last guy who knows what that would mean... anyone? I think we should do this
today.
> 
> Do we need to do anything formal and time consuming before linking to the recovery tool
/ process from that page?
> 
> Also, someone needs to write up the how-to instructions, along with a description of
what to expect.
> 
> Chris
> 
>> 
>> -Mikeal
>> 
>> On Thu, Aug 12, 2010 at 2:33 PM, J Chris Anderson <jchris@apache.org> wrote:
>> 
>>> 
>>> On Aug 12, 2010, at 2:15 PM, J Chris Anderson wrote:
>>> 
>>>> 
>>>> On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote:
>>>> 
>>>>> Right, and jchris' db_repair branch includes my patches for DB reader
>>> _admin access and a more useful progress report in the replication phase of
>>> the repair.
>>>>> 
>>>> 
>>>> I've updated the repair branch with everyone's code. I think it is
>>> faster, due to Adam's idea that if we run the merges in reverse order, those
>>> near the front of the file are more likely to be no-ops, so less work is
>>> done over all.
>>>> 
>>>> Mikeal will be testing for correctness. Could other's please use it and
>>> test for usability as well. Latest code (with instructions) is here:
>>>> 
>>>> http://github.com/jhs/recover-couchdb/
>>>> 
>>>> Which points at http://github.com/jchris/couchdb/tree/db_repair for the
>>> repair code.
>>>> 
>>>> One thing I am not clear about (need better docs) is, do we need to
>>> replicate the original db to the lost+found db (or vice-versa), after
>>> recovery is complete?
>>>> 
>>> 
>>> Also, we should be clear about what the semantics for this are. It can
>>> potentially introduce conflicts if some writes were repeated after restarts.
>>> Should it always be a noop on dbs that are clean w/r/t the bug?
>>> 
>>> Chris
>>> 
>>>> Chris
>>>> 
>>>>> Adam
>>>>> 
>>>>> On Aug 12, 2010, at 3:14 PM, Jason Smith wrote:
>>>>> 
>>>>>> The code is updated with the following changes:
>>>>>> 1. Adhere to the lost+found/databasename custom...
>>>>>> 2. ...except databases starting with _, which goes into
>>>>>> _system/databasename
>>>>>> 3. Sync up with jchris's db_repair branch
>>>>>> 
>>>>>> (About #2, I started with _/database but I think it's too easy to
miss
>>> at
>>>>>> the command line.)
>>>>>> 
>>>>>> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <jchris@gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> A few bug reports from my testing:
>>>>>>> 
>>>>>>> I launched with this command, as specified in the README:
>>>>>>> 
>>>>>>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec
>>> ./recover_couchdb
>>>>>>> {} \;
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> First of all, it chokes on my _users and _replicator db:
>>>>>>> 
>>>>>>> [info] [<0.2.0>] couch_db_repair for _users - scanning
335961 bytes at
>>> 0
>>>>>>> [error] [<0.2.0>] couch_db_repair merge node at 332061
{case_clause,
>>>>>>>                                 {error,illegal_database_name}}
>>>>>>> 
>>>>>>> That second [error] line is repeated many many times (once per
merge I
>>>>>>> think). I think the issue is that _users is hard-coded to be
OK, but
>>>>>>> _users_lost+found is not. So we should do something about that,
maybe
>>> if a
>>>>>>> db-name starts with _ we should call the lost and found
>>> a_users_lost+found
>>>>>>> (_ sorts at the top, so "a" will be near it and legal).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> When a database has readers defined in the security object, the
tool
>>> is
>>>>>>> unable to open them (the reading part of the repair tool needs
to have
>>> the
>>>>>>> _admin userCtx, not just the writer).
>>>>>>> 
>>>>>>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined}
>>> vs
>>>>>>> Names [<<"joe">>] Roles [<<"_admin">>]
>>>>>>> escript: exception throw: {unauthorized,<<"You are not
authorized to
>>> access
>>>>>>> this db.">>}
>>>>>>> in function  couch_db:open/2
>>>>>>> in call from couch_db_repair:make_lost_and_found/3
>>>>>>> in call from recover_couchdb:main/1
>>>>>>> in call from escript:run/2
>>>>>>> in call from escript:start/1
>>>>>>> in call from init:start_it/1
>>>>>>> in call from init:start_em/1
>>>>>>> 
>>>>>>> 
>>>>>>> It would also be helpful if the status lines could say something
more
>>> than
>>>>>>> 
>>>>>>> [info] [<0.2.0>] couch_db_repair writing 15 updates to
>>> bench_lost+found
>>>>>>> 
>>>>>>> Like maybe add a note like "about 23% complete" if at all possible.
>>>>>>> 
>>>>>>> 
>>>>>>> I will patch the first few, I'd love help from someone on the
last
>>> one.
>>>>>>> I'll be on IRC.
>>>>>>> 
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote:
>>>>>>>> 
>>>>>>>>> Hi, Jason.
>>>>>>>>> 
>>>>>>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <jhs@couch.io>
wrote:
>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski <kocolosk@apache.org
>>>> 
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Excellent, thanks for testing.  I caught Jason
Smith saying on IRC
>>>>>>> that he
>>>>>>>>>>> had packaged the whole thing up as an escript
+ some .beams.  If
>>> we
>>>>>>> can get
>>>>>>>>>>> it down to a single file a la rebar that would
be a pretty sweet
>>> way
>>>>>>> to
>>>>>>>>>>> deliver the repair tool in my opinion.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Please check out http://github.com/jhs/repair-couchdb
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I think you mean http://github.com/jhs/recover-couchdb
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> I think it is important that we package and release this,
if it is
>>> ready.
>>>>>>> We should link to it from the bug description page, the project
home
>>> page,
>>>>>>> as well as blog about it, etc. What is the point of working feverishly
>>> on a
>>>>>>> recovery tool if we don't go the last mile?
>>>>>>>> 
>>>>>>>> I am testing it now on my database directory to make sure
it doesn't
>>> harm
>>>>>>> anything (I was never subject to the bug, which is probably where
most
>>>>>>> people are, but they might run it anyway.)
>>>>>>>> 
>>>>>>>> As it stands the submodules thing can't be part of the release,
we
>>> need
>>>>>>> to package it up as a single zip file or something.
>>>>>>>> 
>>>>>>>> Is there anything else that needs to be done before we can
release
>>> this?
>>>>>>>> 
>>>>>>>> Chris
>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Jason Smith
>>>>>>>>> Couchio Hosting
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Jason Smith
>>>>>> Couchio Hosting
>>>>> 
>>>> 
>>> 
>>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message