kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaofeng SHI (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-3644) NumberFormatExcetion on null values when building cube with Spark
Date Tue, 30 Oct 2018 06:27:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668216#comment-16668216
] 

Shaofeng SHI commented on KYLIN-3644:
-------------------------------------

Seems the null values are not allowed in the measure column; adding to 2.6 scope and we will
take a look. Also, if you have fixed it locally, welcome to contribute a patch. Thanks!

> NumberFormatExcetion on null values when building cube with Spark
> -----------------------------------------------------------------
>
>                 Key: KYLIN-3644
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3644
>             Project: Kylin
>          Issue Type: Bug
>          Components: Spark Engine
>    Affects Versions: v2.5.0
>            Reporter: Hubert STEFANI
>            Priority: Major
>             Fix For: v2.6.0
>
>         Attachments: 00_zeppelin_notebook.jpg, 01_overview_table.jpg, 02_dimension_cube.jpg,
03_measure_cube.jpg, sortieData.csv
>
>
> We encounter an error any time we try to build a cube with the following steps :
>  * upload a csv on AWS S3 with following characteristics : the column on which the measure
will be defined has some null values (Cf. attachment)
>  * create a hive table with spark
>  * create a model on  top of this table,
>  * create a cube with a SUM measure
>  * chose Spark as Engine
>  * Launch build
> Result : The build process fails at '{color:#4383b4}#7 Step Name: {color}Build Cube with
Spark' with the following error :
>  
> """"""
> 18/10/23 09:25:39 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopDataset
at SparkCubingByLayer.java:253, took 7,277136 s
> Exception in thread "main" java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkCubingByLayer.
Root cause: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
failure: Lost task 0.3 in stage 0.0 (TID 4, ip-172-31-35-113.eu-west-1.compute.internal, executor
4): java.lang.NumberFormatException: For input string: "\N"
>     at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
>     at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
>     at java.lang.Double.parseDouble(Double.java:538)
>     at org.apache.kylin.measure.basic.DoubleIngester.valueOf(DoubleIngester.java:38)
>     at org.apache.kylin.measure.basic.DoubleIngester.valueOf(DoubleIngester.java:28)
>     at org.apache.kylin.engine.mr.common.BaseCuboidBuilder.buildValueOf(BaseCuboidBuilder.java:162)
>     at org.apache.kylin.engine.mr.common.BaseCuboidBuilder.buildValueObjects(BaseCuboidBuilder.java:127)
>     at org.apache.kylin.engine.spark.SparkCubingByLayer$EncodeBaseCuboid.call(SparkCubingByLayer.java:297)
>     at org.apache.kylin.engine.spark.SparkCubingByLayer$EncodeBaseCuboid.call(SparkCubingByLayer.java:257)
>     at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
>     at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
> """""
> Note 1: the build  process is OK when run with Map/Reduce Engine.
> Note 2: the error doesn't seem to be related to AWS environment.
>  
> Sample of csv :
> ID;CATEGORIE;TEL;MONTANT;MAGASIN;MATRICULE;VILLE;
> 970;161;6-98-6-6-42;838.47034;Magasin_19;Client_Matricule_28;MARSEILLE;
> 971;89;62-15-2-64-86;;;Client_Matricule_1;LYON;
> 972;87;17-64-97-74-42;;;Client_Matricule_105;ORBEC;
> 973;174;79-33-90-0-55;;Magasin_7;Client_Matricule_55;AJACCIO;
> 974;172;89-95-71-6-49;141.64174;Magasin_9;Client_Matricule_105;BASTIA;
> 975;83;7-27-95-28-7;897.28204;;Client_Matricule_199;AJACCIO;
> 976;170;67-72-18-29-34;164.07967;Magasin_3;Client_Matricule_137;LILLE;
> 977;130;14-69-4-23-27;1928.9557;Magasin_1;Client_Matricule_17;NOMNOM;
> 978;43;55-91-84-98-49;891.2691;Magasin_0;Client_Matricule_22;NOMNOM;
> 979;117;98-96-0-54-39;1636.3994;Magasin_9;Client_Matricule_142;MARSEILLE;
> 980;163;37-55-76-53-38;;;Client_Matricule_64;NEWYORK;
> 981;106;32-40-6-46-15;;Magasin_2;Client_Matricule_158;NOMNOM;
> 982;56;95-60-83-89-90;;;Client_Matricule_102;NOMNOM;
> 983;168;21-56-62-0-58;;;Client_Matricule_160;NOMNOM;
> 984;154;92-67-37-94-60;;;Client_Matricule_137;PARIS;
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message