Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 10065 invoked from network); 2 Jan 2011 22:07:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Jan 2011 22:07:09 -0000 Received: (qmail 90268 invoked by uid 500); 2 Jan 2011 22:07:08 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 90169 invoked by uid 500); 2 Jan 2011 22:07:08 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 90161 invoked by uid 99); 2 Jan 2011 22:07:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Jan 2011 22:07:07 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adam.kocoloski@gmail.com designates 209.85.216.173 as permitted sender) Received: from [209.85.216.173] (HELO mail-qy0-f173.google.com) (209.85.216.173) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Jan 2011 22:07:00 +0000 Received: by qyk1 with SMTP id 1so14376109qyk.11 for ; Sun, 02 Jan 2011 14:06:39 -0800 (PST) Received: by 10.229.246.73 with SMTP id lx9mr17762511qcb.1.1294005999055; Sun, 02 Jan 2011 14:06:39 -0800 (PST) Received: from [10.0.1.2] (c-71-232-49-44.hsd1.ma.comcast.net [71.232.49.44]) by mx.google.com with ESMTPS id nb15sm10640152qcb.38.2011.01.02.14.06.37 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 02 Jan 2011 14:06:38 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) Subject: Re: Compact not completing From: Adam Kocoloski In-Reply-To: <20110102024357.gkhnfjeu3o4cskwk@webmail.loop.com.br> Date: Sun, 2 Jan 2011 17:06:36 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <1AF66366-0D6F-41D5-8C8C-4D88BCAB70F9@apache.org> References: <20101231123818.dehfe3vdq8ww8csg@webmail.loop.com.br> <20110101114721.vyoefauts0woc8gg@webmail.loop.com.br> <20110102024357.gkhnfjeu3o4cskwk@webmail.loop.com.br> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1082) X-Virus-Checked: Checked by ClamAV on apache.org Ah, Mike, I didn't get the instructions right in step 1. Sorry about = that. What you really want are the last 1000 Ids in the seq_tree prior = to the compactor crash. So maybe something like GET /iris/_changes?descending=3Dtrue&limit=3D1000&since=3D96282148 Regards, Adam On Jan 2, 2011, at 12:43 AM, mike@loop.com.br wrote: > Adam, >=20 > Thanks for an excellent explanation. It was easy to find the culprit: >=20 > curl -s = '172.17.17.3:5984/iris/_changes?since=3D96281148&limit=3D1000&include_docs= =3Dtrue' | grep -v time > {"results":[ > = {"seq":96281622,"id":"1292252400F7005","changes":[{"rev":"2-d94be4c93931a3= 5524b3f34b9de41a11"}],"deleted":true,"doc":{"_id":"1292252400F7005","_rev"= :"2-d94be4c93931a35524b3f34b9de41a11","_deleted":true}}, > ], > "last_seq":96282306} >=20 > The problem I have is that the document exists with different rev and = is not > deleted: >=20 > curl -s '172.17.17.3:5984/iris/1292252400F7005' > {"_id":"1292252400F7005","_rev":"1-74a74942107db308d42864e50c1517aa", = .... >=20 > I deleted the document and inserted it again but the changes feed = remains > the same as above - I presume the compact will still fail as before. >=20 > Anything else I can do ? (I guess I could hack copy_docs so that = not_found > is not 'fatal'). >=20 > I am compacting regardless, maybe it'll pass..... >=20 > Regards, >=20 > Mike >=20 > Citando Adam Kocoloski : >=20 >> Ok, so this is the same error both times. As far as I can tell it = indicates that the seq_tree and the id_tree indexes are out of sync; = the seq_tree contains some record that isn't present in the id_tree. = That's never supposed to happen, so the compactor crashes instead of = trying to deal with the 'not_found' result when it does a lookup on the = missing entry in the id_tree. >>=20 >> I suspect that the _purge code is to blame, since deletions don't = actually remove entries from these indexes. One thing you might try: >>=20 >> 1) Query _changes starting from 96281148 (1000 less than the last = status update) and grab the next 1000 rows >>=20 >> 2) Figure out which of those entries are missing from the id tree, = e.g. lookup the document and see if the response is = {"not_found":"missing"}. You could also try using include_docs=3Dtrue = on the _changes feed to accomplish the same. >>=20 >> 3) Once you've identified the problematic IDs, try creating them = again. You might end up introducing duplicates in the _changes feed, = but if you do there's a procedure to fix that. >>=20 >> That's the simplest solution I can think of. Purging them again = won't work because the first thing _purge does is lookup the Ids in the = id_tree. Regards, >>=20 >> Adam >>=20 >> On Jan 1, 2011, at 9:47 AM, mike@loop.com.br wrote: >>=20 >>> I did the same with the tagged 1.0.1. Attached is >>> the error produced. My responses are below: >>>=20 >>> Citando Robert Newson : >>>=20 >>>> Some more info would help here. >>>>=20 >>>> 1) How far did compaction get? >>> It gets to seq 96282148 of 109105202 ie: 88% >>>=20 >>>> 2) Do you have enough spare disk space? >>> Yes I have lots of free space :-) >>>=20 >>>> 3) What commit of 1.0.x were you running before you moved to = 08d71849? >>> I was using Dec 13 852fa047. Before that something at least a month = old. >>>=20 >>>> B. >>>>=20 >>>> On Fri, Dec 31, 2010 at 3:55 PM, Robert Newson = wrote: >>>>> Can you try this with a tagged release like 1.0.1? >>>>>=20 >>>>> On Fri, Dec 31, 2010 at 3:38 PM, wrote: >>>>>> Hello, >>>>>>=20 >>>>>> Hoping for some guidance. I have a rather large (295Gb) database = that was >>>>>> created >>>>>> running 1.0.x and I am pretty certain that there is no = corruption - It has >>>>>> always >>>>>> been on a clean ZFS volume. >>>>>>=20 >>>>>> I upgraded to 1.0.x (08d71849464a8e1cc869b385591fa00b3ad0f843 = git) in the >>>>>> hope >>>>>> that it may resolve the issue. >>>>>>=20 >>>>>> I have previously '_purge'd many douments from this database = previously, so >>>>>> that may be relevant. >>>>>>=20 >>>>>> I am annexing the error from couchdb.log >>>>>>=20 >>>>>> Thanks, >>>>>>=20 >>>>>> Mike >>>>>>=20 >>>>>=20 >>>>=20 >>>=20 >>>=20 >>> >>=20 >>=20 >=20 >=20 >=20