couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J Chris Anderson <jch...@apache.org>
Subject Re: data recovery tool progress
Date Fri, 13 Aug 2010 18:05:34 GMT

On Aug 13, 2010, at 9:26 AM, J Chris Anderson wrote:

> Here is my first pass at notes on the repair tool.
> 
> I'd like to get this on the Apache website today so we can publicize it:
> 
> http://wiki.couchone.com/page/repair-tool
> 

Sitting on our hands wasn't doing anyone any good, so I tweeted this as CouchDB.

http://twitter.com/CouchDB/status/21083958040

I still think we should put this on the Apache site and give it a proper announcement.

Chris

> Please read, test, edit, give feedback, etc.
> 
> Thanks,
> Chris
> 
> On Aug 13, 2010, at 7:05 AM, J Chris Anderson wrote:
> 
>> 
>> On Aug 12, 2010, at 11:38 PM, Mikeal Rogers wrote:
>> 
>>> I tested the latest code in recover-couchdb and it looks great.
>> 
>> We need to package this so that it is useable by end-users, and put a link to it
on http://couchdb.apache.org/notice/1.0.1.html
>> 
>> I'm the last guy who knows what that would mean... anyone? I think we should do this
today.
>> 
>> Do we need to do anything formal and time consuming before linking to the recovery
tool / process from that page?
>> 
>> Also, someone needs to write up the how-to instructions, along with a description
of what to expect.
>> 
>> Chris
>> 
>>> 
>>> -Mikeal
>>> 
>>> On Thu, Aug 12, 2010 at 2:33 PM, J Chris Anderson <jchris@apache.org> wrote:
>>> 
>>>> 
>>>> On Aug 12, 2010, at 2:15 PM, J Chris Anderson wrote:
>>>> 
>>>>> 
>>>>> On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote:
>>>>> 
>>>>>> Right, and jchris' db_repair branch includes my patches for DB reader
>>>> _admin access and a more useful progress report in the replication phase
of
>>>> the repair.
>>>>>> 
>>>>> 
>>>>> I've updated the repair branch with everyone's code. I think it is
>>>> faster, due to Adam's idea that if we run the merges in reverse order, those
>>>> near the front of the file are more likely to be no-ops, so less work is
>>>> done over all.
>>>>> 
>>>>> Mikeal will be testing for correctness. Could other's please use it and
>>>> test for usability as well. Latest code (with instructions) is here:
>>>>> 
>>>>> http://github.com/jhs/recover-couchdb/
>>>>> 
>>>>> Which points at http://github.com/jchris/couchdb/tree/db_repair for the
>>>> repair code.
>>>>> 
>>>>> One thing I am not clear about (need better docs) is, do we need to
>>>> replicate the original db to the lost+found db (or vice-versa), after
>>>> recovery is complete?
>>>>> 
>>>> 
>>>> Also, we should be clear about what the semantics for this are. It can
>>>> potentially introduce conflicts if some writes were repeated after restarts.
>>>> Should it always be a noop on dbs that are clean w/r/t the bug?
>>>> 
>>>> Chris
>>>> 
>>>>> Chris
>>>>> 
>>>>>> Adam
>>>>>> 
>>>>>> On Aug 12, 2010, at 3:14 PM, Jason Smith wrote:
>>>>>> 
>>>>>>> The code is updated with the following changes:
>>>>>>> 1. Adhere to the lost+found/databasename custom...
>>>>>>> 2. ...except databases starting with _, which goes into
>>>>>>> _system/databasename
>>>>>>> 3. Sync up with jchris's db_repair branch
>>>>>>> 
>>>>>>> (About #2, I started with _/database but I think it's too easy
to miss
>>>> at
>>>>>>> the command line.)
>>>>>>> 
>>>>>>> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <jchris@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> A few bug reports from my testing:
>>>>>>>> 
>>>>>>>> I launched with this command, as specified in the README:
>>>>>>>> 
>>>>>>>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec
>>>> ./recover_couchdb
>>>>>>>> {} \;
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> First of all, it chokes on my _users and _replicator db:
>>>>>>>> 
>>>>>>>> [info] [<0.2.0>] couch_db_repair for _users - scanning
335961 bytes at
>>>> 0
>>>>>>>> [error] [<0.2.0>] couch_db_repair merge node at 332061
{case_clause,
>>>>>>>>                                {error,illegal_database_name}}
>>>>>>>> 
>>>>>>>> That second [error] line is repeated many many times (once
per merge I
>>>>>>>> think). I think the issue is that _users is hard-coded to
be OK, but
>>>>>>>> _users_lost+found is not. So we should do something about
that, maybe
>>>> if a
>>>>>>>> db-name starts with _ we should call the lost and found
>>>> a_users_lost+found
>>>>>>>> (_ sorts at the top, so "a" will be near it and legal).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> When a database has readers defined in the security object,
the tool
>>>> is
>>>>>>>> unable to open them (the reading part of the repair tool
needs to have
>>>> the
>>>>>>>> _admin userCtx, not just the writer).
>>>>>>>> 
>>>>>>>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined}
>>>> vs
>>>>>>>> Names [<<"joe">>] Roles [<<"_admin">>]
>>>>>>>> escript: exception throw: {unauthorized,<<"You are
not authorized to
>>>> access
>>>>>>>> this db.">>}
>>>>>>>> in function  couch_db:open/2
>>>>>>>> in call from couch_db_repair:make_lost_and_found/3
>>>>>>>> in call from recover_couchdb:main/1
>>>>>>>> in call from escript:run/2
>>>>>>>> in call from escript:start/1
>>>>>>>> in call from init:start_it/1
>>>>>>>> in call from init:start_em/1
>>>>>>>> 
>>>>>>>> 
>>>>>>>> It would also be helpful if the status lines could say something
more
>>>> than
>>>>>>>> 
>>>>>>>> [info] [<0.2.0>] couch_db_repair writing 15 updates
to
>>>> bench_lost+found
>>>>>>>> 
>>>>>>>> Like maybe add a note like "about 23% complete" if at all
possible.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I will patch the first few, I'd love help from someone on
the last
>>>> one.
>>>>>>>> I'll be on IRC.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Chris
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote:
>>>>>>>>> 
>>>>>>>>>> Hi, Jason.
>>>>>>>>>> 
>>>>>>>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <jhs@couch.io>
wrote:
>>>>>>>>>> 
>>>>>>>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski
<kocolosk@apache.org
>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Excellent, thanks for testing.  I caught
Jason Smith saying on IRC
>>>>>>>> that he
>>>>>>>>>>>> had packaged the whole thing up as an escript
+ some .beams.  If
>>>> we
>>>>>>>> can get
>>>>>>>>>>>> it down to a single file a la rebar that
would be a pretty sweet
>>>> way
>>>>>>>> to
>>>>>>>>>>>> deliver the repair tool in my opinion.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Please check out http://github.com/jhs/repair-couchdb
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I think you mean http://github.com/jhs/recover-couchdb
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I think it is important that we package and release this,
if it is
>>>> ready.
>>>>>>>> We should link to it from the bug description page, the project
home
>>>> page,
>>>>>>>> as well as blog about it, etc. What is the point of working
feverishly
>>>> on a
>>>>>>>> recovery tool if we don't go the last mile?
>>>>>>>>> 
>>>>>>>>> I am testing it now on my database directory to make
sure it doesn't
>>>> harm
>>>>>>>> anything (I was never subject to the bug, which is probably
where most
>>>>>>>> people are, but they might run it anyway.)
>>>>>>>>> 
>>>>>>>>> As it stands the submodules thing can't be part of the
release, we
>>>> need
>>>>>>>> to package it up as a single zip file or something.
>>>>>>>>> 
>>>>>>>>> Is there anything else that needs to be done before we
can release
>>>> this?
>>>>>>>>> 
>>>>>>>>> Chris
>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Jason Smith
>>>>>>>>>> Couchio Hosting
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jason Smith
>>>>>>> Couchio Hosting
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
> 


Mime
View raw message