Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6807AC7FA for ; Tue, 12 Aug 2014 07:27:13 +0000 (UTC) Received: (qmail 14038 invoked by uid 500); 12 Aug 2014 07:27:12 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 13908 invoked by uid 500); 12 Aug 2014 07:27:12 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 13743 invoked by uid 99); 12 Aug 2014 07:27:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2014 07:27:12 +0000 Date: Tue, 12 Aug 2014 07:27:12 +0000 (UTC) From: "Michael Armbrust (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SPARK-2967) Several SQL unit test failed when sort-based shuffle is enabled MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2967: ------------------------------------ Priority: Critical (was: Major) > Several SQL unit test failed when sort-based shuffle is enabled > --------------------------------------------------------------- > > Key: SPARK-2967 > URL: https://issues.apache.org/jira/browse/SPARK-2967 > Project: Spark > Issue Type: Bug > Affects Versions: 1.1.0 > Reporter: Saisai Shao > Priority: Critical > > Several SQLQuerySuite unit test failed when sort-based shuffle is enabled. Seems SQL test uses GenericMutableRow which will make ExternalSorter's internal buffer all refered to the same object finally because of object's mutability. Seems row should be copied when feeding into ExternalSorter. > The error shows below, though have many failures, I only pasted part of them: > {noformat} > SQLQuerySuite: > - SPARK-2041 column name equals tablename > - SPARK-2407 Added Parser of SQL SUBSTR() > - index into array > - left semi greater than predicate > - index into array of arrays > - agg *** FAILED *** > Results do not match for query: > Aggregate ['a], ['a,SUM('b) AS c1#38] > UnresolvedRelation None, testData2, None > > == Analyzed Plan == > Aggregate [a#4], [a#4,SUM(CAST(b#5, LongType)) AS c1#38L] > SparkLogicalPlan (ExistingRdd [a#4,b#5], MapPartitionsRDD[7] at mapPartitions at basicOperators.scala:215) > > == Physical Plan == > Aggregate false, [a#4], [a#4,SUM(PartialSum#40L) AS c1#38L] > Exchange (HashPartitioning [a#4], 200) > Aggregate true, [a#4], [a#4,SUM(CAST(b#5, LongType)) AS PartialSum#40L] > ExistingRdd [a#4,b#5], MapPartitionsRDD[7] at mapPartitions at basicOperators.scala:215 > > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !Vector(1, 3) [1,3] > !Vector(2, 3) [1,3] > !Vector(3, 3) [1,3] (QueryTest.scala:53) > - aggregates with nulls > - select * > - simple select > - sorting *** FAILED *** > Results do not match for query: > Sort ['a ASC,'b ASC] > Project [*] > UnresolvedRelation None, testData2, None > > == Analyzed Plan == > Sort [a#4 ASC,b#5 ASC] > Project [a#4,b#5] > SparkLogicalPlan (ExistingRdd [a#4,b#5], MapPartitionsRDD[7] at mapPartitions at basicOperators.scala:215) > > == Physical Plan == > Sort [a#4 ASC,b#5 ASC], true > Exchange (RangePartitioning [a#4 ASC,b#5 ASC], 200) > ExistingRdd [a#4,b#5], MapPartitionsRDD[7] at mapPartitions at basicOperators.scala:215 > > == Results == > !== Correct Answer - 6 == == Spark Answer - 6 == > !Vector(1, 1) [3,2] > !Vector(1, 2) [3,2] > !Vector(2, 1) [3,2] > !Vector(2, 2) [3,2] > !Vector(3, 1) [3,2] > !Vector(3, 2) [3,2] (QueryTest.scala:53) > - limit > - average > - average overflow *** FAILED *** > Results do not match for query: > Aggregate ['b], [AVG('a) AS c0#90,'b] > UnresolvedRelation None, largeAndSmallInts, None > > == Analyzed Plan == > Aggregate [b#3], [AVG(CAST(a#2, LongType)) AS c0#90,b#3] > SparkLogicalPlan (ExistingRdd [a#2,b#3], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:215) > > == Physical Plan == > Aggregate false, [b#3], [(CAST(SUM(PartialSum#93L), DoubleType) / CAST(SUM(PartialCount#94L), DoubleType)) AS c0#90,b#3] > Exchange (HashPartitioning [b#3], 200) > Aggregate true, [b#3], [b#3,COUNT(CAST(a#2, LongType)) AS PartialCount#94L,SUM(CAST(a#2, LongType)) AS PartialSum#93L] > ExistingRdd [a#2,b#3], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:215 > > == Results == > !== Correct Answer - 2 == == Spark Answer - 2 == > !Vector(2.0, 2) [2.147483645E9,1] > !Vector(2.147483645E9, 1) [2.147483645E9,1] (QueryTest.scala:53) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org