spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng Lian (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-15732) Dataset generated code "generated.java" Fails with Certain Case Classes
Date Fri, 03 Jun 2016 01:14:59 GMT

     [ https://issues.apache.org/jira/browse/SPARK-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Cheng Lian resolved SPARK-15732.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Resolved by https://github.com/apache/spark/pull/13485

> Dataset generated code "generated.java" Fails with Certain Case Classes
> -----------------------------------------------------------------------
>
>                 Key: SPARK-15732
>                 URL: https://issues.apache.org/jira/browse/SPARK-15732
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: Version 2.0 Preview on the Databricks Community Edition
>            Reporter: Sanjay Dasgupta
>            Assignee: Wenchen Fan
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> The Dataset code generation logic fails to handle field-names in case classes that are
also Java keywords (e.g. "abstract"). Scala has an escaping mechanism (using backquotes) that
allows Java (and Scala) keywords to be used as names in programs, as in the example below:
> case class PatApp(number: Int, title: String, `abstract`: String)
> But this case class trips up the Dataset code generator. The following error message
is displayed when Datasets containing instances of such case classes are processed.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 54.0
failed 1 times, most recent failure: Lost task 2.0 in stage 54.0 (TID 1304, localhost): java.lang.RuntimeException:
Error while encoding: java.util.concurrent.ExecutionException: java.lang.Exception: failed
to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 60,
Column 84: Unexpected selector 'abstract' after "."
> The following code can be used to replicate the problem. This code was run on the Databricks
CE, in a Scala notebook, in 3 separate cells as shown below:
> // CELL 1:
> //
> // Create a Case Class with "abstract" as a field-name ...
> //
> package keywordissue
> // The field-name abstract is a Java keyword ...
> case class PatApp(number: Int, title: String, `abstract`: String)
> // CELL 2:
> //
> // Create a Dataset using the case class ...
> //
> import keywordissue.PatApp
> val applications = List(PatApp(1001, "1001", "Abstract 1001"), PatApp(1002, "1002", "Abstract
1002"), PatApp(1003, "1003", "Abstract for 1003"), PatApp(/* Duplicate! */ 1003, "1004", "Abstract
1004"))
> val appsDataset = sc.parallelize(applications).toDF.as[PatApp]
> // CELL 3:
> //
> // Force Dataset code-generation. This causes the error message to display ...
> //
> val duplicates = appsDataset.groupByKey(_.number).mapGroups((k, i) => (k, i.length)).filter(_._2
> 0)
> duplicates.collect().foreach(println)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message