Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5B935200BD2 for ; Sat, 3 Dec 2016 17:21:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5A919160B0F; Sat, 3 Dec 2016 16:21:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A5309160B2A for ; Sat, 3 Dec 2016 17:21:49 +0100 (CET) Received: (qmail 80200 invoked by uid 500); 3 Dec 2016 16:21:48 -0000 Mailing-List: contact dev-help@tephra.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tephra.incubator.apache.org Delivered-To: mailing list dev@tephra.incubator.apache.org Received: (qmail 80136 invoked by uid 99); 3 Dec 2016 16:21:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2016 16:21:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 384041A02B8 for ; Sat, 3 Dec 2016 16:21:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.018 X-Spam-Level: X-Spam-Status: No, score=-7.018 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id vmekxir7XJDm for ; Sat, 3 Dec 2016 16:21:46 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 065C45FC17 for ; Sat, 3 Dec 2016 16:21:43 +0000 (UTC) Received: (qmail 79906 invoked by uid 99); 3 Dec 2016 16:21:43 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2016 16:21:43 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id D0FBAF170A; Sat, 3 Dec 2016 16:21:42 +0000 (UTC) From: anew To: dev@tephra.incubator.apache.org Reply-To: dev@tephra.incubator.apache.org References: In-Reply-To: Subject: [GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co... Content-Type: text/plain Message-Id: <20161203162142.D0FBAF170A@git1-us-west.apache.org> Date: Sat, 3 Dec 2016 16:21:42 +0000 (UTC) archived-at: Sat, 03 Dec 2016 16:21:50 -0000 Github user anew commented on a diff in the pull request: https://github.com/apache/incubator-tephra/pull/20#discussion_r90759195 --- Diff: tephra-core/src/main/java/org/apache/tephra/janitor/TransactionPruningPlugin.java --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.tephra.janitor; + +import org.apache.hadoop.conf.Configuration; + +import java.io.IOException; + +/** + * Data janitor interface to manage the invalid transaction list. + * + *

+ * An invalid transaction can only be removed from the invalid list after the data written + * by the invalid transactions has been removed from all the data stores. + * The term data store is used here to represent a set of tables in a database that have + * the same data clean up policy, like all Apache Phoenix tables in an HBase instance. + * + *

+ * Typically every data store will have a background job which cleans up the data written by invalid transactions. + * Prune upper bound for a data store is defined as the largest invalid transaction whose data has been + * cleaned up from that data store. + *

    + * prune-upper-bound = min(max(invalid list), min(in-progress list) - 1)
    + * 
+ * where invalid list and in-progress list are from the transaction snapshot used to clean up the invalid data in the + * data store. + * + *

+ * There will be one such plugin per data store. The plugins will be executed as part of the Transaction Service. + * Each plugin will be invoked periodically to fetch the prune upper bound for its data store. + * Invalid transaction list can pruned up to the minimum of prune upper bounds returned by all the plugins. + */ +public interface TransactionPruningPlugin { + /** + * Called once when the Transaction Service starts up. + * + * @param conf configuration for the plugin + */ + void initialize(Configuration conf) throws IOException; + + /** + * Called periodically to fetch prune upper bound for a data store. The plugin examines the state of data cleanup + * in the data store and determines the smallest invalid transaction whose writes no longer exist in the data + * store. It then returns this smallest invalid transaction as the prune upper bound for this data store. + * + * @param time start time of this prune iteration in milliseconds + * @param pruneUpperBoundForTime the largest invalid transaction that can be possibly removed + * from the invalid list for the given time. + * In terms of HBase, this is the smallest not in-progress transaction that will + * not have writes in any HBase regions that are created after the given time. + * The plugin will typically return a reduced upper bound based on the state of + * the invalid transaction data clean up in the data store. --- End diff -- I still don't understand what this is. I though this is an upper bound determined by the tx manager, based on its knowlegde of what invalid transactions may still have active processes and therefore future writes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---