spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21402) Java encoders - switch fields on collectAsList
Date Fri, 04 Aug 2017 09:49:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114179#comment-16114179
] 

Tom commented on SPARK-21402:
-----------------------------

Additional comment when there are multiple datatypes which are not easily convert to one other
(e.g double <-> string) the system get crazy (as well the instruction in the error message
didn't help) - 

{code:bash}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x000000011be827bc, pid=992, tid=0x0000000000001c03
#
# JRE version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# v  ~StubRoutines::jlong_disjoint_arraycopy
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit
-c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
{code}



> Java encoders - switch fields on collectAsList
> ----------------------------------------------
>
>                 Key: SPARK-21402
>                 URL: https://issues.apache.org/jira/browse/SPARK-21402
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.1
>         Environment: mac os
> spark 2.1.1
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
>            Reporter: Tom
>
> I have the following schema in a dataset -
> root
>  |-- userId: string (nullable = true)
>  |-- data: map (nullable = true)
>  |    |-- key: string
>  |    |-- value: struct (valueContainsNull = true)
>  |    |    |-- startTime: long (nullable = true)
>  |    |    |-- endTime: long (nullable = true)
>  |-- offset: long (nullable = true)
>  And I have the following classes (+ setter and getters which I omitted for simplicity)
-
>  
> {code:java}
> public class MyClass {
>     private String userId;
>     private Map<String, MyDTO> data;
>     private Long offset;
>  }
> public class MyDTO {
>     private long startTime;
>     private long endTime;
> }
> {code}
> I collect the result the following way - 
> {code:java}
>         Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class);
>         Dataset<MyClass> results = raw_df.as(myClassEncoder);
>         List<MyClass> lst = results.collectAsList();
> {code}
>         
> I do several calculations to get the result I want and the result is correct all through
the way before I collect it.
> This is the result for - 
> {code:java}
> results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
> {code}
> |data[2017-07-01].startTime|data[2017-07-01].endTime|
> +-----------------------------+--------------+
> |1498854000                |1498870800              |
> This is the result after collecting the reuslts for - 
> {code:java}
> MyClass userData = results.collectAsList().get(0);
> MyDTO userDTO = userData.getData().get("2017-07-01");
> System.out.println("userDTO startTime: " + userDTO.getStartTime());
> System.out.println("userDTO endTime: " + userDTO.getEndTime());
> {code}
> --
> data startTime: 1498870800
> data endTime: 1498854000
> I tend to believe it is a spark issue. Would love any suggestions on how to bypass it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message