Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 1 Oct 2015 02:55:04 +0000 (UTC)
From: "Jo Desmet (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.12901759.1443667515000.127806.1443668104096@Atlassian.JIRA>
In-Reply-To: <JIRA.12901759.1443667515000@Atlassian.JIRA>
References: <JIRA.12901759.1443667515000@Atlassian.JIRA>
 <JIRA.12901759.1443667515156@arcas>
Subject: [jira] [Updated] (SPARK-10893) Lag Analytic function broken
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jo Desmet updated SPARK-10893:
------------------------------
    Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
{code:title=Bar.java|borderStyle=solid}
    SparkContext sc = new SparkContext(conf);
    HiveContext sqlContext = new HiveContext(sc);
    DataFrame df = sqlContext.read().json(getInputPath("input.json"));
    
    df = df.withColumn(
      "previous",
      lag(dataFrame.col("VBB"), 1)
        .over(Window.orderBy(dataFrame.col("VAA")))
      );
{code}

  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
    SparkContext sc = new SparkContext(conf);
    HiveContext sqlContext = new HiveContext(sc);
    DataFrame df = sqlContext.read().json(getInputPath("input.json"));
    
    df = df.withColumn(
      "previous",
      lag(dataFrame.col("VBB"), 1)
        .over(Window.orderBy(dataFrame.col("VAA")))
      );


> Lag Analytic function broken
> ----------------------------
>
>                 Key: SPARK-10893
>                 URL: https://issues.apache.org/jira/browse/SPARK-10893
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.5.0
>         Environment: Spark Standalone Cluster on Linux
>            Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> Java:
> {code:title=Bar.java|borderStyle=solid}
>     SparkContext sc = new SparkContext(conf);
>     HiveContext sqlContext = new HiveContext(sc);
>     DataFrame df = sqlContext.read().json(getInputPath("input.json"));
>     
>     df = df.withColumn(
>       "previous",
>       lag(dataFrame.col("VBB"), 1)
>         .over(Window.orderBy(dataFrame.col("VAA")))
>       );
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org