Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 16310 invoked from network); 10 Aug 2010 18:29:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Aug 2010 18:29:40 -0000 Received: (qmail 38799 invoked by uid 500); 10 Aug 2010 18:29:40 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 38749 invoked by uid 500); 10 Aug 2010 18:29:39 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 38741 invoked by uid 99); 10 Aug 2010 18:29:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 18:29:39 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of robert.newson@gmail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-wy0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 18:29:34 +0000 Received: by wya21 with SMTP id 21so14901382wya.11 for ; Tue, 10 Aug 2010 11:29:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=kQrLh+PdpPtVHikMBo+WIl7NnjKbzT5FvCqxvBCezzg=; b=pHfmtA+4QDWrCrSpYAexVoDLFSZ4juI/pbyHCDT5I3NlhJrnLQLSNDAxa+OD7SuzYF sL7rZwjD+8/a8QzplQClmDQR0Qxv4inINVW+xLLoHNmgRl7ptnaJV2un15Zun9xUhAfo zCDnxHCYTKdoGFBmfN8N/JQU/uIBDc5ZsJZ+Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=bzpRzeJXsjSDc31d77shDoSlEBuJNQMl8b/m8B8JvZcf4e90ipnDj8MdDEckO75apw aJTAPhsuE8Ew8czH7n7nLLfmukBmxH4luI8MUdPkoLZvXBin3YKd37812CIBuKFfJrgN pRNke+1LoP8fenx/mm+5BJ1Eo2ie1XPQkyiE0= MIME-Version: 1.0 Received: by 10.216.164.66 with SMTP id b44mr2169363wel.81.1281464953294; Tue, 10 Aug 2010 11:29:13 -0700 (PDT) Received: by 10.216.230.92 with HTTP; Tue, 10 Aug 2010 11:29:13 -0700 (PDT) In-Reply-To: <9A34A746-AED9-4FA5-A60E-A40877681C71@apache.org> References: <1690416A-4C01-4756-9D3B-A256DC729813@apache.org> <154AD543-C787-441C-851B-D59CEA6765CC@apache.org> <5F47BBB4-9F58-4EFE-92C8-B0FEDA5B01B7@apache.org> <12229601-B7B8-4E98-931E-054DA00C5092@apache.org> <20100810130338.GA2584@two> <9A625192-F6F5-4AF4-A71E-BE0082789AA5@apache.org> <69F9CA20-2EE8-4AA0-9D4B-084EB994D920@apache.org> <594EF248-98DE-4F10-9C8F-2083EA2DEBE0@apache.org> <9A34A746-AED9-4FA5-A60E-A40877681C71@apache.org> Date: Tue, 10 Aug 2010 19:29:13 +0100 Message-ID: Subject: Re: data recovery tool progress From: Robert Newson To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable It took 20 minutes before the first 'update' line came out, but now seems to be recovering smoothly. machine load is back down to sane levels. Suggest feedback during the hunting phase. B. On Tue, Aug 10, 2010 at 7:11 PM, Adam Kocoloski wrote= : > Thanks for the crosscheck. =A0I'm not aware of anything in the node finde= r that would cause it to struggle mightily with healthy DBs. =A0It pretty m= uch ignores the health of the DB, in fact. =A0Would be interested to hear m= ore. > > On Aug 10, 2010, at 1:59 PM, Robert Newson wrote: > >> I verified the new code's ability to repair the testwritesdb. system >> load was smooth from start to finish. >> >> I started a further test on a different (healthy) database and system >> load was severe again, just collecting the roots (the lost+found db >> was not yet created when I aborted the attempt). I suspect the fact >> that it's healthy is the issue, so if I'm right, perhaps a warning is >> useful. >> >> B. >> >> >> >> On Tue, Aug 10, 2010 at 6:53 PM, Adam Kocoloski wr= ote: >>> Another update. =A0This morning I took a different tack and, rather tha= n try to find root nodes, I just looked for all kv_nodes in the file and tr= eated each of those as a separate virtual DB to be replicated. =A0This redu= ces the algorithmic complexity of the repair, and it looks like testwritesd= b repairs in ~30 minutes or so. =A0Also, this method results in the lost+fo= und DB containing every document, not just the missing ones. >>> >>> My branch does not currently include Randall's parallelization of the r= eplications. =A0It's still CPU-limited, so that may be a worthwhile optimiz= ation. =A0On the other hand, I think we may be reaching a stage at which pe= rformance for this repair tool is 'good enough', and pmaps can make error h= andling a bit dicey. >>> >>> In short, I think this tool is now in good shape. >>> >>> http://github.com/kocolosk/couchdb/tree/db_repair >>> > >