spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jo Desmet (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-10893) Lag Analytic function broken
Date Thu, 01 Oct 2015 02:57:04 GMT

     [ https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jo Desmet updated SPARK-10893:
------------------------------
    Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase
it was always giving the fixed value '103079215105' when I tried to run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{code:borderStyle=solid}
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}
{code}

Java:
{code:borderStyle=solid}
    SparkContext sc = new SparkContext(conf);
    HiveContext sqlContext = new HiveContext(sc);
    DataFrame df = sqlContext.read().json(getInputPath("input.json"));
    
    df = df.withColumn(
      "previous",
      lag(dataFrame.col("VBB"), 1)
        .over(Window.orderBy(dataFrame.col("VAA")))
      );
{code}

  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase
it was always giving the fixed value '103079215105' when I tried to run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
{code:borderStyle=solid}
    SparkContext sc = new SparkContext(conf);
    HiveContext sqlContext = new HiveContext(sc);
    DataFrame df = sqlContext.read().json(getInputPath("input.json"));
    
    df = df.withColumn(
      "previous",
      lag(dataFrame.col("VBB"), 1)
        .over(Window.orderBy(dataFrame.col("VAA")))
      );
{code}


> Lag Analytic function broken
> ----------------------------
>
>                 Key: SPARK-10893
>                 URL: https://issues.apache.org/jira/browse/SPARK-10893
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.5.0
>         Environment: Spark Standalone Cluster on Linux
>            Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase
it was always giving the fixed value '103079215105' when I tried to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {code:borderStyle=solid}
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> {code}
> Java:
> {code:borderStyle=solid}
>     SparkContext sc = new SparkContext(conf);
>     HiveContext sqlContext = new HiveContext(sc);
>     DataFrame df = sqlContext.read().json(getInputPath("input.json"));
>     
>     df = df.withColumn(
>       "previous",
>       lag(dataFrame.col("VBB"), 1)
>         .over(Window.orderBy(dataFrame.col("VAA")))
>       );
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message