Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EBBE2200D50 for ; Mon, 4 Dec 2017 18:09:09 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EA393160BF9; Mon, 4 Dec 2017 17:09:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2F97B160C05 for ; Mon, 4 Dec 2017 18:09:09 +0100 (CET) Received: (qmail 35866 invoked by uid 500); 4 Dec 2017 17:09:08 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 35771 invoked by uid 99); 4 Dec 2017 17:09:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Dec 2017 17:09:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6E5521806B0 for ; Mon, 4 Dec 2017 17:09:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id miqPsuFvxK7n for ; Mon, 4 Dec 2017 17:09:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id CCF9460D8C for ; Mon, 4 Dec 2017 17:09:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 75FBFE04AB for ; Mon, 4 Dec 2017 17:09:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 026D5255C2 for ; Mon, 4 Dec 2017 17:09:01 +0000 (UTC) Date: Mon, 4 Dec 2017 17:09:01 +0000 (UTC) From: "Josh Elser (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 04 Dec 2017 17:09:10 -0000 [ https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277096#comment-16277096 ] Josh Elser commented on ACCUMULO-4751: -------------------------------------- I would say #2 at first glance, but I am more worried about how we missed the createdTime situation. Ideally, even in the face of TServer crashes, the TServer would set the correct "metadata" on each Status record. Do you have any hunch as to how this record exists without the createdTime attribute set? It would be nice to confirm that we don't have some other kind of bug lingering in which we're just not writing the record correctly. I wouldn't be surprised if we would actually need some kind of solution (like #2) to guard against some kind of unlikely situation (e.g. tserver failure) in addition another bug. In other words, the catch-all to prevent the system from "wedging" on these WALs would be appreciated. > Some WALs don't replicate due to lacking a createdTime entry > ------------------------------------------------------------ > > Key: ACCUMULO-4751 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4751 > Project: Accumulo > Issue Type: Bug > Affects Versions: 1.7.3, 1.8.1 > Reporter: Adam J Shook > Assignee: Adam J Shook > > From what I can tell, the below error is thrown when no data for a particular table is written to a WAL, but the file is closed. This would be because the {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and therefore does not have a {{createdTime}}. This prevents a WAL from being replicated until a {{createdTime}} entry is added manually. > From the Accumulo master: > {code} > Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2 in table 7l was written to metadata table which lacked createdTime > {code} > There are two solutions I have in mind: > 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given. > 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's modification time in HDFS if the WAL is closed but there is no {{createdTime}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)