Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A5982200BA6 for ; Tue, 18 Oct 2016 15:46:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A4034160ADC; Tue, 18 Oct 2016 13:46:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C0D35160ACC for ; Tue, 18 Oct 2016 15:46:15 +0200 (CEST) Received: (qmail 68237 invoked by uid 500); 18 Oct 2016 13:46:14 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 68226 invoked by uid 99); 18 Oct 2016 13:46:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Oct 2016 13:46:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 659E41A0773 for ; Tue, 18 Oct 2016 13:46:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id QVImQpypWMPZ for ; Tue, 18 Oct 2016 13:46:10 +0000 (UTC) Received: from mail-yb0-f181.google.com (mail-yb0-f181.google.com [209.85.213.181]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 652B25FAD8 for ; Tue, 18 Oct 2016 13:46:10 +0000 (UTC) Received: by mail-yb0-f181.google.com with SMTP id x128so22037408ybg.1 for ; Tue, 18 Oct 2016 06:46:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=8TXXtxKJW8eTvieoyECuEjC1zPfXc+GeEuQ9qffVVHI=; b=l/qiM2e7FCT5hjP8XMfJikUvkXzD3bly9wEdcQD4+un5LutpXbnnZSgVvDCz+gduCg 4bZF6PYA1osK8zAWBdzxPGXCb0dLL85Nk0adCBIBNmyYdSlV9kIFLK5Hpn2+P9MWph9r JZU4PFFYksoFvw3BZfb+bXEhhkp+3W1EbGJUnpb3+JoPrKHDlnrc9F9A5nKVBYfO1+xL /vTNLbjkAqFYVbTQ6fBJw87e7BJHKZGjAy564mv7sCEyeRh2ImArm1pZ/6lZwhL5ePUh AFPk6Z7OYXUzHHWctjwmDMxjFbtBVERGtfXe7WXQF69QqRCjMFCceRcsXH+eFhIjLRDW ttRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=8TXXtxKJW8eTvieoyECuEjC1zPfXc+GeEuQ9qffVVHI=; b=Tx/H1OdgrAHVdtQ+Ud+RzqeDih7uz4ezGfKP776R88UOq/rT2UDhhMzBK3ZdElmoll hMiB+VCXmnmEHRlFyZNmuppKEQWX6DoZd/v9sWL0z3z6J7ldgZCC97HX0f5oIHa2eA+F LK9WpKb+eYsMLwlWHuxK19NJWII1vXIrxQQBRjtdjcMbxvj9iLf/T+qZXzChx51Qd9cf gtLICHvVuMdJlg0rSgNn8qNyM8OoQ7lof7FKy2YTMypVkgMv9AK5olxJRvAbVugcQyTp aRyRAf9jEz3+lK5RynnIinJb5pHUJccT1tAxMOF4Xxdl/RNxSm+7OGwLGpABSyjpNd+Y urJA== X-Gm-Message-State: AA6/9Rn0Q5VEyrFvTaoN3GNca4/m6Ng/i8K04k7BgzW9G122kP4H5qxbTG5lq3ZbgmeZMheVpciNfM1c79e4Qw== X-Received: by 10.37.170.98 with SMTP id s89mr588338ybi.21.1476798369252; Tue, 18 Oct 2016 06:46:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.129.112.200 with HTTP; Tue, 18 Oct 2016 06:46:08 -0700 (PDT) In-Reply-To: <3a3d276b-bd3c-61bf-58f6-5c885c253f9c@ccri.com> References: <3a3d276b-bd3c-61bf-58f6-5c885c253f9c@ccri.com> From: Michael Wall Date: Tue, 18 Oct 2016 09:46:08 -0400 Message-ID: Subject: Re: java.IO.EOFException: ..../accumulo/recovery/.../part-r-00000/index not a SequenceFile. To: Accumulo User List Content-Type: multipart/alternative; boundary=94eb2c19ad5c344aa5053f23eaf5 archived-at: Tue, 18 Oct 2016 13:46:16 -0000 --94eb2c19ad5c344aa5053f23eaf5 Content-Type: text/plain; charset=UTF-8 Andrew, That is what I was going to suggest you try. Where is that "Unable to find recovery files for extent" log? Anyway we can see some actual logs? Are all the WALs there? Do you find any of the WAL deleted by GC in the gc logs? Do you find any duplicates WALs in the HDFS trash? On Tue, Oct 18, 2016 at 9:32 AM, Andrew Hulbert wrote: > Mike, > For one of the WALs I backed up the recovery directory and that initiated > a new recovery attempt as indicated in the tserver debug log... > > Then the exception was thrown: > > Unable to find recovery files for extent xxxxxx logentry xxxxx > hdfs://path/to/wal/yyyy > > Any ideas? I figure we can zero out the WAL and it will go on with life > but it would be nice to try and get the data! > > Thanks! > > > On 10/18/2016 08:55 AM, Jeff Kubina wrote: > > > On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall wrote: > >> Take a look at the master logs for where the WAL was sorted to the /accumulo/recovery/... >> directory. Then look to see if those WALs are still around and contain >> content. >> > > Checked one of them, yes it is around with content. > > Where is this this EOF exception, on a tserver? >> > > Yes, the tserver. > > >> Is the master log complaining about anything? >> > > Repeating a message similar to the tserver but also that the tablet > assignment failed for the tserver. > > tservers are not balancing because of all this. > > > > --94eb2c19ad5c344aa5053f23eaf5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Andrew,

That is what I was going to= suggest you try.=C2=A0 Where is that "Unable to find recovery files f= or extent" log?=C2=A0 Anyway we can see some actual logs?

Are all the WALs there?=C2=A0 Do you find any of the WAL deleted by GC in = the gc logs?=C2=A0 Do you find any duplicates WALs in the HDFS trash?

On Tue, Oct 1= 8, 2016 at 9:32 AM, Andrew Hulbert <ahulbert@ccri.com> wrote= :
=20 =20 =20

Mike,

For one of the WALs I backed up the recovery directory and that initiated a new recovery attempt as indicated in the tserver debug log...

Then the exception was thrown:

Unable to find recovery files for extent xxxxxx logentry xxxxx hdfs://path/to/wal/yyyy

Any ideas? I figure we can zero out the WAL and it will go on with life but it would be nice to try and get the data!

Thanks!


On 10/18/2016 08:5= 5 AM, Jeff Kubina wrote:

On Tue, Oct 18, 2016 at 6:32 AM, Michael Wall <mjwall@gmail.com> wrote:
Take a look at the master logs for where the WAL was sorted to the=C2=A0/accumu= lo/recovery/... directory.=C2=A0 Then look to see if those WALs are still around and contain content.

Checked one of them, yes it is around with content.

Where is this this EOF exception, on a tserver?

Yes, the tserver.
=C2=A0
Is the master log complaining about anything?

Repeating a message similar to the tserver but also that the tablet assignment failed for the tserver.

tservers are not balancing because of all this.




--94eb2c19ad5c344aa5053f23eaf5--