Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 39CCA200C2A for ; Wed, 15 Feb 2017 07:10:20 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3885A160B6A; Wed, 15 Feb 2017 06:10:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7FF49160B5F for ; Wed, 15 Feb 2017 07:10:19 +0100 (CET) Received: (qmail 62363 invoked by uid 500); 15 Feb 2017 06:10:18 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 62352 invoked by uid 99); 15 Feb 2017 06:10:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2017 06:10:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C23AD189511 for ; Wed, 15 Feb 2017 06:10:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Jm6OW4jDsA5Y for ; Wed, 15 Feb 2017 06:10:16 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 1065A5FAE6 for ; Wed, 15 Feb 2017 06:10:15 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v1F6AEZr029408; Wed, 15 Feb 2017 06:10:14 GMT Message-Id: <201702150610.v1F6AEZr029408@ip-10-146-233-104.ec2.internal> Date: Wed, 15 Feb 2017 06:10:14 +0000 From: "Alex Behm (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Marcel Kornacker , Dimitris Tsirogiannis Reply-To: alex.behm@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4916=3A_Fix_maintenance_of_set_of_item_sets_in_DisjointSet=2E=0A?= X-Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212 X-Gerrit-ChangeURL: X-Gerrit-Commit: b552dfecc0e96b5b7a2f8b4ced7fc65d1013ee18 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Wed, 15 Feb 2017 06:10:20 -0000 Alex Behm has uploaded a new patch set (#2). Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet. ...................................................................... IMPALA-4916: Fix maintenance of set of item sets in DisjointSet. The bug: The DisjointSet maintains a set of unique item sets using a HashSet>. The problem is that we modified the Set elements after inserting them into the HashSet. This caused the removal of elements from the HashSet to fail. Removal is required for maintaining a consistent DisjointSet. The removal could even fail for the same Set instance because the hashCode() changed from when the Set was originally inserted to when the removal was attempted due to mutation of the Set. An inconsistent DisjointSet can lead to incorrect equivalence classes, which can lead to missing, redundant and even non-executable predicates. Incorrect results and crashes are possible. For most queries, an inconsistent DisjointSet does not alter the equivalence classes, and even fewer queries have incorrect plans. In fact, many of our existing planner tests trigger this bug, but only 3 of them lead to an incorrect value transfer graph, and all 3 had correct plans. The fix: Use an IdentityHashMap to store the set of item sets. It does not rely on the hashCode() and equals() of the stored elements, so the same object can be added and later removed, even when mutated in the meantime. Testing: - Added a Preconditions check in DisjointSet that asserts correct removal of an item set. Many of our existing tests hit the check before this fix. - Added a new unit test for DisjointSet which triggers the bug. - Augmented DisjointSet.checkConsistency() to check for inconsistency in the set of item sets. - Added validation of the value-transfer graph in single-node planner tests. - A private core/hdfs run succeeded. Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/util/DisjointSet.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/util/TestDisjointSet.java M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 6 files changed, 65 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/5980/2 -- To view, visit http://gerrit.cloudera.org:8080/5980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker