From dev-return-490-apmail-tephra-dev-archive=tephra.apache.org@tephra.incubator.apache.org Sat Dec 3 16:21:49 2016 Return-Path: X-Original-To: apmail-tephra-dev-archive@minotaur.apache.org Delivered-To: apmail-tephra-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 795B819551 for ; Sat, 3 Dec 2016 16:21:48 +0000 (UTC) Received: (qmail 80008 invoked by uid 500); 3 Dec 2016 16:21:48 -0000 Delivered-To: apmail-tephra-dev-archive@tephra.apache.org Received: (qmail 79967 invoked by uid 500); 3 Dec 2016 16:21:48 -0000 Mailing-List: contact dev-help@tephra.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tephra.incubator.apache.org Delivered-To: mailing list dev@tephra.incubator.apache.org Received: (qmail 79927 invoked by uid 99); 3 Dec 2016 16:21:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2016 16:21:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id BA5581A02B8 for ; Sat, 3 Dec 2016 16:21:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.018 X-Spam-Level: X-Spam-Status: No, score=-7.018 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id leHfyx5V4Ram for ; Sat, 3 Dec 2016 16:21:44 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id E7DC55FB1E for ; Sat, 3 Dec 2016 16:21:43 +0000 (UTC) Received: (qmail 79896 invoked by uid 99); 3 Dec 2016 16:21:43 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2016 16:21:43 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id EF034E02E4; Sat, 3 Dec 2016 16:21:42 +0000 (UTC) From: anew To: dev@tephra.incubator.apache.org Reply-To: dev@tephra.incubator.apache.org References: In-Reply-To: Subject: [GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co... Content-Type: text/plain Message-Id: <20161203162142.EF034E02E4@git1-us-west.apache.org> Date: Sat, 3 Dec 2016 16:21:42 +0000 (UTC) Github user anew commented on a diff in the pull request: https://github.com/apache/incubator-tephra/pull/20#discussion_r90759660 --- Diff: tephra-hbase-compat-1.1-base/src/main/java/org/apache/tephra/hbase/coprocessor/janitor/HBaseTransactionPruningPlugin.java --- @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.tephra.hbase.coprocessor.janitor; + +import com.google.common.base.Function; +import com.google.common.collect.Iterables; +import com.google.common.collect.Maps; +import com.google.common.collect.Sets; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.HRegionInfo; +import org.apache.hadoop.hbase.HTableDescriptor; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.Admin; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.tephra.TxConstants; +import org.apache.tephra.hbase.coprocessor.TransactionProcessor; +import org.apache.tephra.janitor.TransactionPruningPlugin; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.SortedSet; +import java.util.TreeSet; + +/** + * Default implementation of the {@link TransactionPruningPlugin} for HBase. + * + * This plugin determines the prune upper bound for transactional HBase tables that use + * coprocessor {@link TransactionProcessor}. + * + *

State storage:

+ * + * This plugin expects the TransactionProcessor to save the prune upper bound for invalid transactions + * after every major compaction of a region. Let's call this (region, prune upper bound). + * In addition, the plugin also persists the following information on a run at time t + *
    + *
  • + * (t, set of regions): Set of transactional regions at time t. + * Transactional regions are regions of the tables that have the coprocessor TransactionProcessor + * attached to them. + *
  • + *
  • + * (t, prune upper bound): This is the smallest not in-progress transaction that + * will not have writes in any HBase regions that are created after time t. + * This value is determined by the Transaction Service based on the transaction state at time t + * and passed on to the plugin. + *
  • + *
+ * + *

Computing prune upper bound:

+ * + * In a typical HBase instance, there can be a constant change in the number of regions due to region creations, + * splits and merges. At any given time there can always be a region on which a major compaction has not been run. + * Since the prune upper bound will get recorded for a region only after a major compaction, + * using only the latest set of regions we may not be able to find the + * prune upper bounds for all the current regions. Hence we persist the set of regions that exist at that time + * of each run of the plugin, and use historical region set for time t, t - 1, etc. + * to determine the prune upper bound. + * + * From the regions saved at time t, t - 1, etc., + * the plugin tries to find the latest (t, set of regions) where all regions have been major compacted, + * i.e, all regions have prune upper bound recorded in (region, prune upper bound). + *
+ * If such a set is found for time t1, the prune upper bound returned by the plugin is the minimum of + *
    + *
  • Prune upper bounds of regions in set (t1, set of regions)
  • + *
  • Prune upper bound from (t1, prune upper bound)
  • + *
+ * + *

+ * Above, when we find (t1, set of regions), there may a region that was created after time t1, + * but has a data write from an invalid transaction that is smaller than the prune upper bounds of all + * regions in (t1, set of regions). This is possible because (region, prune upper bound) persisted by + * TransactionProcessor is always the latest prune upper bound for a region. + *
+ * However a region created after time t1 cannot have writes from an invalid transaction that is smaller than + * min(max(invalid list), min(in-progress list) - 1) at the time the region was created. + * Since we limit the plugin prune upper bound using (t1, prune upper bound), there should be no invalid --- End diff -- Also, I am not sure whether min(max(invalid list), min(in-progress list) - 1) is correct. A transaction may still generate writes after it became invalid. For example, if the transaction timeout is 30 seconds, the client may perform a write after 60 seconds, and then crash and never commit or rollback. If we prune this transaction between the 30 and 60 seconds, then we will have an invalid write in the store. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---