Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CF0B0196BD for ; Fri, 18 Mar 2016 00:18:30 +0000 (UTC) Received: (qmail 63858 invoked by uid 500); 18 Mar 2016 00:18:30 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 63769 invoked by uid 500); 18 Mar 2016 00:18:30 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 63754 invoked by uid 99); 18 Mar 2016 00:18:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2016 00:18:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id BE70BC0741 for ; Fri, 18 Mar 2016 00:18:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.879 X-Spam-Level: *** X-Spam-Status: No, score=3.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_BL_SPAMCOP_NET=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 1O5q_4wQ2D5T for ; Fri, 18 Mar 2016 00:18:28 +0000 (UTC) Received: from mail-yw0-f182.google.com (mail-yw0-f182.google.com [209.85.161.182]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5435F5F1F3 for ; Fri, 18 Mar 2016 00:18:28 +0000 (UTC) Received: by mail-yw0-f182.google.com with SMTP id g127so120853445ywf.2 for ; Thu, 17 Mar 2016 17:18:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=q4Ocb95sJv50HZCeMQ9uCpygxcOXBZYmVmy2WXcTLLk=; b=kONC1C1pmm47w2IhMb7MTa+9frbOvSQ7O11Bu2Cgti1cmx4JKNmIcDrh55aEiXgNLF fwAz0+7K9V5lgDD+E3eph68KV8lTHg550Fh+sXW+4MdQLzY6b582dxMem//RBQFZRpGd KBxYIfqOM15TT/ZaIkDY+sePnYUSkppJPFtCZZT5YMv1XBBygb+7pY+ro9MKGCVOQzdx VX6huSlbmCzmWdGWfvkykSEPgo8xKmsP0/otI1R5A4IexyW7UwkXm9T9vURvVM6VPLZH Qf4MPlMwcFsPC0Gd+GfLMQ62vEnzzUQG3V83NW992xQBjTrYgEvnTlxrlHAosbP5Rof3 4fhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=q4Ocb95sJv50HZCeMQ9uCpygxcOXBZYmVmy2WXcTLLk=; b=PMg+hiNH4EK6y0j4itKpsi0tPRl9q+L2w1DnaqTK4qbdNNXQI0+QscdlqY9W938JIi 4ug4q8XywzF+nKvyli611DiBXs+pa7SAIGHkG/e7kxwnfQ+AZkY7E8kdOuq6fv22ThqV lH/cBfiUbmirwvWMsVwrjyQpCbERpuBhnVDcYmyR4cRK/ic75BIoU8G7H1BmShtePUi/ fHRu4BI7nKFEw/OofVz/8wVJR0De+o/dGtoCJ20hL1NAcfWeTucNWs8PljaRdb/nl1pS SKVsfGYY7Jc0YFJwplii6lbaTliEfKILufUTDLyJRKV1mQr9H1T6h/cx6H2rwHHP4SC/ vAjQ== X-Gm-Message-State: AD7BkJKM4KSPE7wpCn7RzYO3iqhDqXj32UnBA1fgi10qdY1hb8pyi9f/7Lcgx+cM2I4/oay3/YT+np1rxImSgA== MIME-Version: 1.0 X-Received: by 10.13.197.194 with SMTP id h185mr5686564ywd.12.1458260307753; Thu, 17 Mar 2016 17:18:27 -0700 (PDT) Received: by 10.129.146.209 with HTTP; Thu, 17 Mar 2016 17:18:27 -0700 (PDT) In-Reply-To: <56DEFDCB.109@ccri.com> References: <56DEFDCB.109@ccri.com> Date: Thu, 17 Mar 2016 20:18:27 -0400 Message-ID: Subject: Re: Recovery file versus directory From: Michael Wall To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a114edd4aa2415d052e47af5a --001a114edd4aa2415d052e47af5a Content-Type: text/plain; charset=UTF-8 Andrew, Sounds a lot like https://issues.apache.org/jira/browse/ACCUMULO-4157. I'll look to see if what you describe could also happen with this bug. If you still have the gc logs, can you look for a message like "Removing WAL for offline server" with the uuid? Mike On Tue, Mar 8, 2016 at 11:28 AM, Andrew Hulbert wrote: > Hi folks, > > We experienced a problem this morning with a recovery on 1.6.1 that went > something like this: > > FileNotFoundException: File does not exist: > hdfs:///accumulo/recovery//failed/data > > at Tablet.java:1410 > at Tablet.java:1233 > etc. > at TabletServer:2923 > > Interestingly enough, at hdfs:///accumulo/recovery//failed was a 0 > byte file, not a directory...and it was preventing tablets from getting > assigned (I am not sure what caused the original failure, but I believe > what happened is a tserver node was going down...the master indicated it > was trying to shutdown the a tserver which was so bad off someone just > rekicked the node). > > I looked through the fixes for 1.6.2,3,4,5 but didn't see anything related > on the release notes pages but I haven't gone through all the tickets yet. > I haven't been able to get anyone to upgrade to 1.6.5 yet and perhaps its > already fixed. > > Just wondering if that's something that has been seen before? > > In order to fix it I just deleted the failed file and it proceeded > > Thanks! > > Andrew > --001a114edd4aa2415d052e47af5a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Andrew,

Sounds a lot like=C2=A0https://issues.apa= che.org/jira/browse/ACCUMULO-4157.=C2=A0 I'll look to see if what y= ou describe could also happen with this bug.=C2=A0 If you still have the gc= logs, can you look for a message like "Removing WAL for offline serve= r" with the uuid?

Mike

On Tue, Mar 8, 2016 at 11:2= 8 AM, Andrew Hulbert <ahulbert@ccri.com> wrote:
Hi folks,

We experienced a problem this morning with a recovery on 1.6.1 that went so= mething like this:

FileNotFoundException: File does not exist: hdfs:///accumulo/recovery/<u= uid>/failed/data

at Tablet.java:1410
at Tablet.java:1233
etc.
at TabletServer:2923

Interestingly enough, at hdfs:///accumulo/recovery/<uuid>/failed was = a 0 byte file, not a directory...and it was preventing tablets from getting= assigned (I am not sure what caused the original failure, but I believe wh= at happened is a tserver node was going down...the master indicated it was = trying to shutdown the a tserver which was so bad off someone just rekicked= the node).

I looked through the fixes for 1.6.2,3,4,5 but didn't see anything rela= ted on the release notes pages but I haven't gone through all the ticke= ts yet. I haven't been able to get anyone to upgrade to 1.6.5 yet and p= erhaps its already fixed.

Just wondering if that's something that has been seen before?

In order to fix it I just deleted the failed file and it proceeded

Thanks!

Andrew

--001a114edd4aa2415d052e47af5a--