tephra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anew <...@git.apache.org>
Subject [GitHub] incubator-tephra pull request #20: Compute global prune upper bound using co...
Date Sat, 03 Dec 2016 16:21:42 GMT
Github user anew commented on a diff in the pull request:

    --- Diff: tephra-core/src/main/java/org/apache/tephra/janitor/TransactionPruningPlugin.java
    @@ -0,0 +1,90 @@
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.tephra.janitor;
    +import org.apache.hadoop.conf.Configuration;
    +import java.io.IOException;
    + * Data janitor interface to manage the invalid transaction list.
    + *
    + * <p/>
    + * An invalid transaction can only be removed from the invalid list after the data written
    + * by the invalid transactions has been removed from all the data stores.
    + * The term data store is used here to represent a set of tables in a database that have
    + * the same data clean up policy, like all Apache Phoenix tables in an HBase instance.
    + *
    + * <p/>
    + * Typically every data store will have a background job which cleans up the data written
by invalid transactions.
    + * Prune upper bound for a data store is defined as the largest invalid transaction whose
data has been
    + * cleaned up from that data store.
    + * <pre>
    + * prune-upper-bound = min(max(invalid list), min(in-progress list) - 1)
    + * </pre>
    + * where invalid list and in-progress list are from the transaction snapshot used to
clean up the invalid data in the
    + * data store.
    + *
    + * <p/>
    + * There will be one such plugin per data store. The plugins will be executed as part
of the Transaction Service.
    + * Each plugin will be invoked periodically to fetch the prune upper bound for its data
    + * Invalid transaction list can pruned up to the minimum of prune upper bounds returned
by all the plugins.
    + */
    +public interface TransactionPruningPlugin {
    +  /**
    +   * Called once when the Transaction Service starts up.
    +   *
    +   * @param conf configuration for the plugin
    +   */
    +  void initialize(Configuration conf) throws IOException;
    +  /**
    +   * Called periodically to fetch prune upper bound for a data store. The plugin examines
the state of data cleanup
    +   * in the data store and determines the smallest invalid transaction whose writes no
longer exist in the data
    +   * store. It then returns this smallest invalid transaction as the prune upper bound
for this data store.
    +   *
    +   * @param time start time of this prune iteration in milliseconds
    +   * @param pruneUpperBoundForTime the largest invalid transaction that can be possibly
    +   *                               from the invalid list for the given time.
    +   *                               In terms of HBase, this is the smallest not in-progress
transaction that will
    +   *                               not have writes in any HBase regions that are created
after the given time.
    +   *                               The plugin will typically return a reduced upper bound
based on the state of
    +   *                               the invalid transaction data clean up in the data
    --- End diff --
    I still don't understand what this is. I though this is an upper bound determined by the
tx manager, based on its knowlegde of what invalid transactions may still have active processes
and therefore future writes?

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message