Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 09066200C0D for ; Tue, 31 Jan 2017 17:44:19 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 07A53160B52; Tue, 31 Jan 2017 16:44:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9CC1B160B36 for ; Tue, 31 Jan 2017 17:44:17 +0100 (CET) Received: (qmail 5411 invoked by uid 500); 31 Jan 2017 16:44:16 -0000 Mailing-List: contact commits-help@quickstep.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@quickstep.incubator.apache.org Delivered-To: mailing list commits@quickstep.incubator.apache.org Received: (qmail 5402 invoked by uid 99); 31 Jan 2017 16:44:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jan 2017 16:44:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 552E21A0318 for ; Tue, 31 Jan 2017 16:44:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.219 X-Spam-Level: X-Spam-Status: No, score=-6.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id zz0rOhNLOOXs for ; Tue, 31 Jan 2017 16:44:12 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id D50025FAD8 for ; Tue, 31 Jan 2017 16:44:10 +0000 (UTC) Received: (qmail 5304 invoked by uid 99); 31 Jan 2017 16:44:10 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jan 2017 16:44:10 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 079D3DFD71; Tue, 31 Jan 2017 16:44:10 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: jianqiao@apache.org To: commits@quickstep.incubator.apache.org Message-Id: <9df50ae4e410434d8eaa12538aba72ba@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: incubator-quickstep git commit: Push down low cost disjunctive predicates to filter the stored relations early [Forced Update!] Date: Tue, 31 Jan 2017 16:44:10 +0000 (UTC) archived-at: Tue, 31 Jan 2017 16:44:19 -0000 Repository: incubator-quickstep Updated Branches: refs/heads/push-down-disjunctive-predicate 26e7f80a6 -> b4291d7a8 (forced update) Push down low cost disjunctive predicates to filter the stored relations early Project: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/commit/b4291d7a Tree: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/tree/b4291d7a Diff: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/diff/b4291d7a Branch: refs/heads/push-down-disjunctive-predicate Commit: b4291d7a86623b547eee5be5324eb553dcb23c11 Parents: 6d83b46 Author: Jianqiao Zhu Authored: Mon Jan 30 01:02:19 2017 -0600 Committer: Jianqiao Zhu Committed: Tue Jan 31 10:44:03 2017 -0600 ---------------------------------------------------------------------- query_optimizer/CMakeLists.txt | 1 + query_optimizer/PhysicalGenerator.cpp | 6 + query_optimizer/rules/CMakeLists.txt | 24 ++ .../PushDownLowCostDisjunctivePredicate.cpp | 225 +++++++++++++++++++ .../PushDownLowCostDisjunctivePredicate.hpp | 116 ++++++++++ 5 files changed, 372 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-quickstep/blob/b4291d7a/query_optimizer/CMakeLists.txt ---------------------------------------------------------------------- diff --git a/query_optimizer/CMakeLists.txt b/query_optimizer/CMakeLists.txt index e8bc21c..0ca971d 100644 --- a/query_optimizer/CMakeLists.txt +++ b/query_optimizer/CMakeLists.txt @@ -207,6 +207,7 @@ target_link_libraries(quickstep_queryoptimizer_PhysicalGenerator quickstep_queryoptimizer_physical_Physical quickstep_queryoptimizer_rules_AttachLIPFilters quickstep_queryoptimizer_rules_PruneColumns + quickstep_queryoptimizer_rules_PushDownLowCostDisjunctivePredicate quickstep_queryoptimizer_rules_ReorderColumns quickstep_queryoptimizer_rules_StarSchemaHashJoinOrderOptimization quickstep_queryoptimizer_rules_SwapProbeBuild http://git-wip-us.apache.org/repos/asf/incubator-quickstep/blob/b4291d7a/query_optimizer/PhysicalGenerator.cpp ---------------------------------------------------------------------- diff --git a/query_optimizer/PhysicalGenerator.cpp b/query_optimizer/PhysicalGenerator.cpp index e12f8be..5180c79 100644 --- a/query_optimizer/PhysicalGenerator.cpp +++ b/query_optimizer/PhysicalGenerator.cpp @@ -28,6 +28,7 @@ #include "query_optimizer/physical/Physical.hpp" #include "query_optimizer/rules/AttachLIPFilters.hpp" #include "query_optimizer/rules/PruneColumns.hpp" +#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp" #include "query_optimizer/rules/ReorderColumns.hpp" #include "query_optimizer/rules/StarSchemaHashJoinOrderOptimization.hpp" #include "query_optimizer/rules/SwapProbeBuild.hpp" @@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan( P::PhysicalPtr PhysicalGenerator::optimizePlan() { std::vector>> rules; rules.emplace_back(new PruneColumns()); + rules.emplace_back(new PushDownLowCostDisjunctivePredicate()); if (FLAGS_reorder_hash_joins) { rules.emplace_back(new StarSchemaHashJoinOrderOptimization()); rules.emplace_back(new PruneColumns()); @@ -121,6 +123,10 @@ P::PhysicalPtr PhysicalGenerator::optimizePlan() { // should be re-evaluated. rules.emplace_back(new ReorderColumns()); } + + // NOTE(jianqiao): Adding rules after AttachLIPFilters requires extra handling + // of LIPFilterConfiguration for transformed nodes. So currently it is suggested + // that all the new rules be placed before this point. if (FLAGS_use_lip_filters) { rules.emplace_back(new AttachLIPFilters()); } http://git-wip-us.apache.org/repos/asf/incubator-quickstep/blob/b4291d7a/query_optimizer/rules/CMakeLists.txt ---------------------------------------------------------------------- diff --git a/query_optimizer/rules/CMakeLists.txt b/query_optimizer/rules/CMakeLists.txt index fe2fd17..86d1ef7 100644 --- a/query_optimizer/rules/CMakeLists.txt +++ b/query_optimizer/rules/CMakeLists.txt @@ -24,6 +24,9 @@ add_library(quickstep_queryoptimizer_rules_CollapseProject CollapseProject.cpp C add_library(quickstep_queryoptimizer_rules_GenerateJoins GenerateJoins.cpp GenerateJoins.hpp) add_library(quickstep_queryoptimizer_rules_PruneColumns PruneColumns.cpp PruneColumns.hpp) add_library(quickstep_queryoptimizer_rules_PushDownFilter PushDownFilter.cpp PushDownFilter.hpp) +add_library(quickstep_queryoptimizer_rules_PushDownLowCostDisjunctivePredicate + PushDownLowCostDisjunctivePredicate.cpp + PushDownLowCostDisjunctivePredicate.hpp) add_library(quickstep_queryoptimizer_rules_PushDownSemiAntiJoin PushDownSemiAntiJoin.cpp PushDownSemiAntiJoin.hpp) add_library(quickstep_queryoptimizer_rules_ReorderColumns ReorderColumns.cpp ReorderColumns.hpp) add_library(quickstep_queryoptimizer_rules_Rule ../../empty_src.cpp Rule.hpp) @@ -111,6 +114,26 @@ target_link_libraries(quickstep_queryoptimizer_rules_PushDownFilter quickstep_queryoptimizer_rules_RuleHelper quickstep_queryoptimizer_rules_TopDownRule quickstep_utility_Macros) +target_link_libraries(quickstep_queryoptimizer_rules_PushDownLowCostDisjunctivePredicate + ${GFLAGS_LIB_NAME} + quickstep_queryoptimizer_costmodel_StarSchemaSimpleCostModel + quickstep_queryoptimizer_expressions_AttributeReference + quickstep_queryoptimizer_expressions_ExpressionUtil + quickstep_queryoptimizer_expressions_LogicalAnd + quickstep_queryoptimizer_expressions_LogicalOr + quickstep_queryoptimizer_expressions_PatternMatcher + quickstep_queryoptimizer_expressions_Predicate + quickstep_queryoptimizer_physical_Aggregate + quickstep_queryoptimizer_physical_HashJoin + quickstep_queryoptimizer_physical_NestedLoopsJoin + quickstep_queryoptimizer_physical_PatternMatcher + quickstep_queryoptimizer_physical_Physical + quickstep_queryoptimizer_physical_PhysicalType + quickstep_queryoptimizer_physical_Selection + quickstep_queryoptimizer_physical_TableReference + quickstep_queryoptimizer_physical_TopLevelPlan + quickstep_queryoptimizer_rules_Rule + quickstep_utility_Macros) target_link_libraries(quickstep_queryoptimizer_rules_PushDownSemiAntiJoin quickstep_queryoptimizer_expressions_AttributeReference quickstep_queryoptimizer_expressions_ExpressionUtil @@ -225,6 +248,7 @@ target_link_libraries(quickstep_queryoptimizer_rules quickstep_queryoptimizer_rules_GenerateJoins quickstep_queryoptimizer_rules_PruneColumns quickstep_queryoptimizer_rules_PushDownFilter + quickstep_queryoptimizer_rules_PushDownLowCostDisjunctivePredicate quickstep_queryoptimizer_rules_PushDownSemiAntiJoin quickstep_queryoptimizer_rules_ReorderColumns quickstep_queryoptimizer_rules_Rule http://git-wip-us.apache.org/repos/asf/incubator-quickstep/blob/b4291d7a/query_optimizer/rules/PushDownLowCostDisjunctivePredicate.cpp ---------------------------------------------------------------------- diff --git a/query_optimizer/rules/PushDownLowCostDisjunctivePredicate.cpp b/query_optimizer/rules/PushDownLowCostDisjunctivePredicate.cpp new file mode 100644 index 0000000..e39f155 --- /dev/null +++ b/query_optimizer/rules/PushDownLowCostDisjunctivePredicate.cpp @@ -0,0 +1,225 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp" + +#include +#include + +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp" +#include "query_optimizer/expressions/AttributeReference.hpp" +#include "query_optimizer/expressions/ExpressionUtil.hpp" +#include "query_optimizer/expressions/LogicalAnd.hpp" +#include "query_optimizer/expressions/LogicalOr.hpp" +#include "query_optimizer/expressions/PatternMatcher.hpp" +#include "query_optimizer/expressions/Predicate.hpp" +#include "query_optimizer/physical/Aggregate.hpp" +#include "query_optimizer/physical/HashJoin.hpp" +#include "query_optimizer/physical/NestedLoopsJoin.hpp" +#include "query_optimizer/physical/PatternMatcher.hpp" +#include "query_optimizer/physical/Physical.hpp" +#include "query_optimizer/physical/PhysicalType.hpp" +#include "query_optimizer/physical/Selection.hpp" +#include "query_optimizer/physical/TableReference.hpp" +#include "query_optimizer/physical/TopLevelPlan.hpp" + +#include "gflags/gflags.h" + +#include "glog/logging.h" + +namespace quickstep { +namespace optimizer { + +DEFINE_uint64(push_down_disjunctive_predicate_cardinality_threshold, 100u, + "The cardinality threshold for a stored relation for the " + "PushDownLowCostDisjunctivePredicate optimization rule to push " + "down a disjunctive predicate to pre-filter that relation."); + +namespace E = ::quickstep::optimizer::expressions; +namespace P = ::quickstep::optimizer::physical; + +P::PhysicalPtr PushDownLowCostDisjunctivePredicate::apply(const P::PhysicalPtr &input) { + DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan); + + const P::TopLevelPlanPtr top_level_plan = + std::static_pointer_cast(input); + cost_model_.reset( + new cost::StarSchemaSimpleCostModel( + top_level_plan->shared_subplans())); + + collectApplicablePredicates(input); + + if (!applicable_predicates_.empty()) { + // Apply the selected predicates to stored relations. + return attachPredicates(input); + } else { + return input; + } +} + +void PushDownLowCostDisjunctivePredicate::collectApplicablePredicates( + const physical::PhysicalPtr &input) { + P::TableReferencePtr table_reference; + if (P::SomeTableReference::MatchesWithConditionalCast(input, &table_reference)) { + // Consider only stored relations with small cardinality as targets. + if (cost_model_->estimateCardinality(input) <= + FLAGS_push_down_disjunctive_predicate_cardinality_threshold) { + applicable_nodes_.emplace_back(input, &table_reference->attribute_list()); + } + return; + } + + for (const auto &child : input->children()) { + collectApplicablePredicates(child); + } + + E::PredicatePtr filter_predicate = nullptr; + switch (input->getPhysicalType()) { + case P::PhysicalType::kAggregate: { + filter_predicate = + std::static_pointer_cast(input)->filter_predicate(); + break; + } + case P::PhysicalType::kHashJoin: { + const P::HashJoinPtr hash_join = + std::static_pointer_cast(input); + if (hash_join->join_type() == P::HashJoin::JoinType::kInnerJoin) { + filter_predicate = hash_join->residual_predicate(); + } + break; + } + case P::PhysicalType::kNestedLoopsJoin: { + filter_predicate = + std::static_pointer_cast(input)->join_predicate(); + break; + } + case P::PhysicalType::kSelection: { + filter_predicate = + std::static_pointer_cast(input)->filter_predicate(); + break; + } + default: + break; + } + + E::LogicalOrPtr disjunctive_predicate; + if (filter_predicate == nullptr || + !E::SomeLogicalOr::MatchesWithConditionalCast(filter_predicate, &disjunctive_predicate)) { + return; + } + + // Consider only disjunctive normal form, i.e. disjunction of conjunctions. + // Divide the disjunctive components into groups. + std::vector> candidate_predicates; + std::vector>> candidate_attributes; + for (const auto &conjunctive_predicate : disjunctive_predicate->operands()) { + candidate_predicates.emplace_back(); + candidate_attributes.emplace_back(); + E::LogicalAndPtr logical_and; + if (E::SomeLogicalAnd::MatchesWithConditionalCast(conjunctive_predicate, &logical_and)) { + for (const auto &predicate : logical_and->operands()) { + candidate_predicates.back().emplace_back(predicate); + candidate_attributes.back().emplace_back( + predicate->getReferencedAttributes()); + } + } else { + candidate_predicates.back().emplace_back(conjunctive_predicate); + candidate_attributes.back().emplace_back( + conjunctive_predicate->getReferencedAttributes()); + } + } + + // Check whether the conditions are met for pushing down part of the predicates + // to each small-cardinality stored relation. + for (const auto &node_pair : applicable_nodes_) { + const std::vector &target_attributes = *node_pair.second; + std::vector selected_disj_preds; + for (std::size_t i = 0; i < candidate_predicates.size(); ++i) { + const auto &cand_preds = candidate_predicates[i]; + const auto &cand_attrs = candidate_attributes[i]; + + std::vector selected_conj_preds; + for (std::size_t j = 0; j < cand_preds.size(); ++j) { + if (E::SubsetOfExpressions(cand_attrs[j], target_attributes)) { + selected_conj_preds.emplace_back(cand_preds[j]); + } + } + if (selected_conj_preds.empty()) { + // Not every disjunctive component contains a predicate that can be applied + // to the table reference node -- condition failed, exit. + selected_disj_preds.clear(); + break; + } else { + selected_disj_preds.emplace_back( + CreateConjunctive(selected_conj_preds)); + } + } + if (!selected_disj_preds.empty()) { + applicable_predicates_[node_pair.first].add( + CreateDisjunctive(selected_disj_preds)); + } + } +} + +P::PhysicalPtr PushDownLowCostDisjunctivePredicate::attachPredicates( + const P::PhysicalPtr &input) const { + std::vector new_children; + for (const P::PhysicalPtr &child : input->children()) { + const P::PhysicalPtr new_child = attachPredicates(child); + new_children.push_back(new_child); + } + + const P::PhysicalPtr output = + new_children == input->children() ? input + : input->copyWithNewChildren(new_children); + + const auto &node_it = applicable_predicates_.find(input); + if (node_it != applicable_predicates_.end()) { + const E::PredicatePtr filter_predicate = + CreateConjunctive(node_it->second.predicates); + return P::Selection::Create(output, + E::ToNamedExpressions(output->getOutputAttributes()), + filter_predicate); + } + + return output; +} + +E::PredicatePtr PushDownLowCostDisjunctivePredicate::CreateConjunctive( + const std::vector predicates) { + DCHECK_GE(predicates.size(), 1u); + if (predicates.size() == 1) { + return predicates.front(); + } else { + return E::LogicalAnd::Create(predicates); + } +} + +E::PredicatePtr PushDownLowCostDisjunctivePredicate::CreateDisjunctive( + const std::vector predicates) { + DCHECK_GE(predicates.size(), 1u); + if (predicates.size() == 1) { + return predicates.front(); + } else { + return E::LogicalOr::Create(predicates); + } +} + +} // namespace optimizer +} // namespace quickstep http://git-wip-us.apache.org/repos/asf/incubator-quickstep/blob/b4291d7a/query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp ---------------------------------------------------------------------- diff --git a/query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp b/query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp new file mode 100644 index 0000000..3e4b602 --- /dev/null +++ b/query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp @@ -0,0 +1,116 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_PUSH_DOWN_LOW_COST_DISJUNCTIVE_PREDICATE_HPP_ +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_PUSH_DOWN_LOW_COST_DISJUNCTIVE_PREDICATE_HPP_ + +#include +#include +#include +#include +#include +#include + +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp" +#include "query_optimizer/expressions/AttributeReference.hpp" +#include "query_optimizer/expressions/Predicate.hpp" +#include "query_optimizer/physical/Physical.hpp" +#include "query_optimizer/rules/Rule.hpp" +#include "utility/Macros.hpp" + +namespace quickstep { +namespace optimizer { + +/** \addtogroup OptimizerRules + * @{ + */ + +/** + * @brief Rule that applies to a physical plan to push down low-cost disjunctive + * predicate when proper conditions are met. + * + * Here we elaborate the conditions. + * + * Let + * P = p_{1,1} AND ... AND p_{1, m_1} OR ... OR p_{n,1} AND ... AND p_{n, m_n} + * be a predicate in disjunctive normal form. + * + * Now consider each small-cardinality relation R, if for each i in 1..n, there + * exists at least one predicate p_{i, k_i} that is applicable to R. Then we can + * construct a new predicate + * P' = p_{1, k_1} OR ... OR p_{n, k_n} + * and push down P' to be applied to R. + * + * Also, if any conjunctive component in P contains more than one predicate that + * is applicable to R, then we can combine all these applicable predicates as a + * conjunctive component in P'. + * + * Finally, note that if there exists a conjunctive component that contains no + * predicate applicable to R. Then the condition fails and we cannot do a push + * down for R. + */ +class PushDownLowCostDisjunctivePredicate : public Rule { + public: + /** + * @brief Constructor. + */ + PushDownLowCostDisjunctivePredicate() {} + + ~PushDownLowCostDisjunctivePredicate() override {} + + std::string getName() const override { + return "PushDownLowCostDisjunctivePredicate"; + } + + physical::PhysicalPtr apply(const physical::PhysicalPtr &input) override; + + private: + struct PredicateInfo { + PredicateInfo() {} + inline void add(expressions::PredicatePtr predicate) { + predicates.emplace_back(predicate); + } + std::vector predicates; + }; + + void collectApplicablePredicates(const physical::PhysicalPtr &input); + + physical::PhysicalPtr attachPredicates(const physical::PhysicalPtr &input) const; + + static expressions::PredicatePtr CreateConjunctive( + const std::vector predicates); + + static expressions::PredicatePtr CreateDisjunctive( + const std::vector predicates); + + std::unique_ptr cost_model_; + + std::vector *>> applicable_nodes_; + std::map applicable_predicates_; + + DISALLOW_COPY_AND_ASSIGN(PushDownLowCostDisjunctivePredicate); +}; + +/** @} */ + +} // namespace optimizer +} // namespace quickstep + +#endif // QUICKSTEP_QUERY_OPTIMIZER_RULES_PUSH_DOWN_LOW_COST_DISJUNCTIVE_PREDICATE_HPP_