Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4047018159 for ; Thu, 1 Oct 2015 02:55:04 +0000 (UTC) Received: (qmail 53860 invoked by uid 500); 1 Oct 2015 02:55:04 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 53830 invoked by uid 500); 1 Oct 2015 02:55:04 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 53819 invoked by uid 99); 1 Oct 2015 02:55:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2015 02:55:04 +0000 Date: Thu, 1 Oct 2015 02:55:04 +0000 (UTC) From: "Jo Desmet (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SPARK-10893) Lag Analytic function broken MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jo Desmet updated SPARK-10893: ------------------------------ Description: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: {code:title=Bar.java|borderStyle=solid} SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); {code} was: Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. Note that this only happens on Spark 1.5.0, and only when running in cluster mode. It works fine when running on Spark 1.4.1, or when running in local mode. I did not test on a yarn cluster. I did not test other analytic aggregates. Input Jason: {"VAA":"A", "VBB":1} {"VAA":"B", "VBB":-1} {"VAA":"C", "VBB":2} {"VAA":"d", "VBB":3} {"VAA":null, "VBB":null} Java: SparkContext sc = new SparkContext(conf); HiveContext sqlContext = new HiveContext(sc); DataFrame df = sqlContext.read().json(getInputPath("input.json")); df = df.withColumn( "previous", lag(dataFrame.col("VBB"), 1) .over(Window.orderBy(dataFrame.col("VAA"))) ); > Lag Analytic function broken > ---------------------------- > > Key: SPARK-10893 > URL: https://issues.apache.org/jira/browse/SPARK-10893 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 1.5.0 > Environment: Spark Standalone Cluster on Linux > Reporter: Jo Desmet > > Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer. > Note that this only happens on Spark 1.5.0, and only when running in cluster mode. > It works fine when running on Spark 1.4.1, or when running in local mode. > I did not test on a yarn cluster. > I did not test other analytic aggregates. > Input Jason: > {"VAA":"A", "VBB":1} > {"VAA":"B", "VBB":-1} > {"VAA":"C", "VBB":2} > {"VAA":"d", "VBB":3} > {"VAA":null, "VBB":null} > Java: > {code:title=Bar.java|borderStyle=solid} > SparkContext sc = new SparkContext(conf); > HiveContext sqlContext = new HiveContext(sc); > DataFrame df = sqlContext.read().json(getInputPath("input.json")); > > df = df.withColumn( > "previous", > lag(dataFrame.col("VBB"), 1) > .over(Window.orderBy(dataFrame.col("VAA"))) > ); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org