spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang-Chi Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases
Date Thu, 06 Apr 2017 23:59:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960001#comment-15960001
] 

Liang-Chi Hsieh commented on SPARK-20226:
-----------------------------------------

{{spark.sql.constraintPropagation.enabled}} is a SQL config flag. I am not sure if your local.conf
only covers Spark configuration via SparkConf. Can you explicitly set this flag in your application
through SQLConf?

> Call to sqlContext.cacheTable takes an incredibly long time in some cases
> -------------------------------------------------------------------------
>
>                 Key: SPARK-20226
>                 URL: https://issues.apache.org/jira/browse/SPARK-20226
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>         Environment: linux or windows
>            Reporter: Barry Becker
>              Labels: cache
>         Attachments: profile_indexer2.PNG, xyzzy.csv
>
>
> I have a case where the call to sqlContext.cacheTable can take an arbitrarily long time
depending on the number of columns that are referenced in a withColumn expression applied
to a dataframe.
> The dataset is small (20 columns 7861 rows). The sequence to reproduce is the following:
> 1) add a new column that references 8 - 14 of the columns in the dataset. 
>    - If I add 8 columns, then the call to cacheTable is fast - like *5 seconds*
>    - If I add 11 columns, then it is slow - like *60 seconds*
>    - and if I add 14 columns, then it basically *takes forever* - I gave up after 10
minutes or so.
> 	The Column expression that is added, is basically just concatenating the columns together
in a single string. If a number is concatenated on a string (or vice versa) the number is
first converted to a string.
>       The expression looks something like this:
> {code}
> `Plate` + `State` + `License Type` + `Summons Number` + `Issue Date` + `Violation Time`
+ `Violation` + `Judgment Entry Date` + `Fine Amount` + `Penalty Amount` + `Interest Amount`
> {code}
> 	  which we then convert to a Column expression that looks like this:
> {code}
> UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF('Plate, 'State), 'License Type), UDF('Summons
Number)), UDF('Issue Date)), 'Violation Time), 'Violation), UDF('Judgment Entry Date)), UDF('Fine
Amount)), UDF('Penalty Amount)), UDF('Interest Amount))
> {code}
> 	 where the UDFs are very simple functions that basically call toString and + as needed.
> 2) apply a pipeline that includes some transformers that was saved earlier. Here are
the steps of the pipeline (extracted from parquet)
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200603,"sparkVersion":"2.1.0","uid":"strIdx_aeb04d2777cc","paramMap":{"handleInvalid":"skip","outputCol":"State_IDX__","inputCol":"State_CLEANED__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200837,"sparkVersion":"2.1.0","uid":"strIdx_0164c4c13979","paramMap":{"inputCol":"License
Type_CLEANED__","handleInvalid":"skip","outputCol":"License Type_IDX__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201068,"sparkVersion":"2.1.0","uid":"strIdx_25b6cbd02751","paramMap":{"inputCol":"Violation_CLEANED__","handleInvalid":"skip","outputCol":"Violation_IDX__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201282,"sparkVersion":"2.1.0","uid":"strIdx_aa12df0354d9","paramMap":{"handleInvalid":"skip","inputCol":"County_CLEANED__","outputCol":"County_IDX__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201552,"sparkVersion":"2.1.0","uid":"strIdx_babb120f3cc1","paramMap":{"handleInvalid":"skip","outputCol":"Issuing
Agency_IDX__","inputCol":"Issuing Agency_CLEANED__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201759,"sparkVersion":"2.1.0","uid":"strIdx_5f2de9d9542d","paramMap":{"handleInvalid":"skip","outputCol":"Violation
Status_IDX__","inputCol":"Violation Status_CLEANED__"}}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333201987,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_6f65ca9fa813",
> 	"paramMap":{
> 	  "outputCol":"Summons Number_BINNED__","handleInvalid":"keep","splits":["-Inf",1.386630656E9,3.696078592E9,4.005258752E9,6.045063168E9,8.136507392E9,"Inf"],"inputCol":"Summons
Number_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202079,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_f5db4fb8120e",
>     "paramMap":{
> 	   "splits":["-Inf",1.435215616E9,1.443855616E9,1.447271936E9,1.448222464E9,1.448395264E9,1.448481536E9,1.448827136E9,1.449259264E9,1.449432064E9,1.449518336E9,"Inf"],
> 	    "handleInvalid":"keep","outputCol":"Issue Date_BINNED__","inputCol":"Issue Date_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202172,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_74568a2a5cfd",
> 	"paramMap":{
> 	  "handleInvalid":"keep","outputCol":"Fine Amount_BINNED__","inputCol":"Fine Amount_CLEANED__","splits":["-Inf",47.5,57.5,62.5,105.0,"Inf"]
> 	 }
> 	}{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202269,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_109705dfdbcd",
> 	"paramMap":{"splits":["-Inf",0.004999999888241291,"Inf"],"outputCol":"Interest Amount_BINNED__","handleInvalid":"keep","inputCol":"Interest
Amount_CLEANED__"}
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202362,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_2b2e3d8a324f",
> 	"paramMap":{
> 	   "handleInvalid":"keep","inputCol":"Reduction Amount_CLEANED__","outputCol":"Reduction
Amount_BINNED__",
> 	   "splits":["-Inf",5.994999885559082,24.0,41.0,57.5,120.0,"Inf"]
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202485,"sparkVersion":"2.1.0",
>      "uid":"bucketizer_4d44c2ebf489",
>      "paramMap":{
>        "splits":["-Inf",18.75,42.5,52.5,57.5,70.0050048828125,75.96499633789062,100.58499908447266,115.4949951171875,125.02000427246094,"Inf"],"handleInvalid":"keep",
> 	   "outputCol":"Payment Amount_BINNED__","inputCol":"Payment Amount_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202587,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_05a75eeef997",
> 	"paramMap":{
> 	   "handleInvalid":"keep",
> 	   "splits":["-Inf",32.904998779296875,55.12000274658203,72.5,91.69999694824219,116.05500030517578,125.02999877929688,"Inf"],
> 	   "outputCol":"Amount Due_BINNED__","inputCol":"Amount Due_CLEANED__"
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202678,"sparkVersion":"2.1.0",
>     "uid":"bucketizer_64b3ef2f97cf",
> 	"paramMap":{"outputCol":"Precinct_BINNED__","handleInvalid":"keep","inputCol":"Precinct_CLEANED__","splits":["-Inf",0.5,23.5,"Inf"]}
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.VectorAssembler","timestamp":1491333202774,"sparkVersion":"2.1.0",
>     "uid":"vecAssembler_932758a8f18e",
> 	"paramMap":{
> 	  "outputCol":"_features_column__",
> 	  "inputCols":["State_IDX__","License Type_IDX__","Violation_IDX__","County_IDX__","Issuing
Agency_IDX__","Violation Status_IDX__","Summons Number_BINNED__","Issue Date_BINNED__","Fine
Amount_BINNED__","Interest Amount_BINNED__","Reduction Amount_BINNED__","Payment Amount_BINNED__","Amount
Due_BINNED__","Precinct_BINNED__"]
> 	}
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.classification.NaiveBayesModel","timestamp":1491333202874,"sparkVersion":"2.1.0",
>     "uid":"nb_e4b24f3c08b0",
> 	"paramMap":{
> 	  "probabilityCol":"_class_probability_column__",
> 	  "labelCol":"Penalty Amount_BINNED__",
> 	  "predictionCol":"_prediction_column_",
> 	  "modelType":"multinomial",
> 	  "featuresCol":"_features_column__",
> 	  "rawPredictionCol":"rawPrediction",
> 	  "smoothing":3.518236190922951E-4
> 	 }
>    }{code}
>  - {code}{"class":"org.apache.spark.ml.feature.SQLTransformer","timestamp":1491333203106,"sparkVersion":"2.1.0",
>     "uid":"sql_1ea4c1b5c52e",
> 	"paramMap":{"statement":"SELECT *, CAST(_prediction_column_ AS INT) AS `_*_prediction_label_column_*__`
FROM __THIS__ /*cutInfo:[10.0,25.0]*/"}
>    }{code}
>    3) Call cacheTable on sqlContext. The actual code used is:
>    {code}
>     val key = "foo"
>     if (sqlContext.tableNames.contains(key))
>       sqlContext.dropTempTable(key)
>     df.createOrReplaceTempView(key)
>     sqlContext.cacheTable(key)        <-- this takes a very long time
> {code}
> When I step through cacheTable in the debugger (in CacheManager.cacheQuery), I see that
the query "planToCache" is very large (see below). 
> I don't know much about query plans. Is this sort of giant nested query plan expected
in this case? Is it in any way typical? Does it explain why it takes a very long time to cache?
Why would adding just a few more columns to the add column expression result in a plan that
takes exponentially longer?
> {code}
> SubqueryAlias foo123, `foo123`
> +- Project [Plate#123, State#124, License Type#125, Summons Number#126, Issue Date#127,
Violation Time#128, Violation#129, Judgment Entry Date#130, Fine Amount#131, Penalty Amount#132,
Interest Amount#133, Reduction Amount#134, Payment Amount#135, Amount Due#136, Precinct#137,
County#138, Issuing Agency#139, Violation Status#140, columnBasedOnManyCols#141, Penalty Amount
(predicted)#2363]
>    +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License
Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue
Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129,
Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine
Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest
Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
... 33 more fields]
>       +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License
Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue
Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129,
Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine
Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest
Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
... 33 more fields]
>          +- SubqueryAlias sql_1ea4c1b5c52e_5640c7097aca, `sql_1ea4c1b5c52e_5640c7097aca`
>             +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163,
License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211,
Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction
Amount#134, ... 32 more fields]
>                +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163,
License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211,
Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction
Amount#134, ... 31 more fields]
>                   +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163,
License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211,
Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction
Amount#134, ... 30 more fields]
>                      +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163,
License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211,
Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction
Amount#134, ... 29 more fields]
>                         +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163,
License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211,
Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction
Amount#134, ... 28 more fields]
>                            +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163,
License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211,
Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction
Amount#134, ... 27 more fields]
>                               +- Project [Plate#123, Plate_CLEANED__#162, State#124,
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132,
Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
Amount_CLEANED__#251, Reduction Amount#134, ... 26 more fields]
>                                  +- Project [Plate#123, Plate_CLEANED__#162, State#124,
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132,
Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
Amount_CLEANED__#251, Reduction Amount#134, ... 25 more fields]
>                                     +- Project [Plate#123, Plate_CLEANED__#162, State#124,
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132,
Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
Amount_CLEANED__#251, Reduction Amount#134, ... 24 more fields]
>                                        +- Project [Plate#123, Plate_CLEANED__#162, State#124,
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132,
Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
Amount_CLEANED__#251, Reduction Amount#134, ... 23 more fields]
>                                           +- Project [Plate#123, Plate_CLEANED__#162,
State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132,
Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
Amount_CLEANED__#251, Reduction Amount#134, ... 22 more fields]
>                                              +- Project [Plate#123, Plate_CLEANED__#162,
State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132,
Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
Amount_CLEANED__#251, Reduction Amount#134, ... 21 more fields]
>                                                 +- Project [Plate#123, Plate_CLEANED__#162,
State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132,
Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
Amount_CLEANED__#251, Reduction Amount#134, ... 20 more fields]
>                                                    +- Filter UDF(Violation Status_CLEANED__#174)
>                                                       +- Project [Plate#123, Plate_CLEANED__#162,
State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132,
Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
Amount_CLEANED__#251, Reduction Amount#134, ... 19 more fields]
>                                                          +- Filter UDF(Issuing Agency_CLEANED__#173)
>                                                             +- Project [Plate#123, Plate_CLEANED__#162,
State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty Amount#132,
Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
Amount_CLEANED__#251, Reduction Amount#134, ... 18 more fields]
>                                                                +- Filter UDF(County_CLEANED__#172)
>                                                                   +- Project [Plate#123,
Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164,
Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210,
Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167,
Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212,
Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest
Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 17 more fields]
>                                                                      +- Filter UDF(Violation_CLEANED__#167)
>                                                                         +- Project [Plate#123,
Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164,
Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210,
Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167,
Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212,
Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest
Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 16 more fields]
>                                                                            +- Filter
UDF(License Type_CLEANED__#164)
>                                                                               +- Project
[Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License
Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167,
Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212,
Penalty Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, Interest
Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 15 more fields]
>                                                                                  +- Filter
UDF(State_CLEANED__#163)
>                                                                                     +-
Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125,
License Type_CLEANED__#164, CASE WHEN isnull(Summons Number#126) THEN NaN ELSE Summons Number#126
END AS Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210,
Violation Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167,
Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212,
Penalty Amount#132, Penalty Amount_CLEANED__#213, CASE WHEN isnull(Interest Amount#133) THEN
NaN ELSE Interest Amount#133 END AS Interest Amount_CLEANED__#250, Interest Amount#133, CASE
WHEN isnull(Reduction Amount#134) THEN NaN ELSE Reduction Amount#134 END AS Reduction Amount_CLEANED__#251,
Reduction Amount#134, ... 14 more fields]
>                                                                                     
  +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125,
License Type_CLEANED__#164, Summons Number#126, Issue Date#127, CASE WHEN isnull(Issue Date_CLEANED__#165)
THEN NaN ELSE Issue Date_CLEANED__#165 END AS Issue Date_CLEANED__#210, Violation Time#128,
Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130,
CASE WHEN isnull(Judgment Entry Date_CLEANED__#168) THEN NaN ELSE Judgment Entry Date_CLEANED__#168
END AS Judgment Entry Date_CLEANED__#211, Fine Amount#131, CASE WHEN isnull(Fine Amount_CLEANED__#169)
THEN NaN ELSE Fine Amount_CLEANED__#169 END AS Fine Amount_CLEANED__#212, Penalty Amount#132,
CASE WHEN isnull(Penalty Amount_CLEANED__#170) THEN NaN ELSE Penalty Amount_CLEANED__#170
END AS Penalty Amount_CLEANED__#213, Interest Amount#133, Reduction Amount#134, Payment Amount#135,
Amount Due#136, Precinct#137, ... 9 more fields]
>                                                                                     
     +- Project [Plate#123, UDF(Plate#123) AS Plate_CLEANED__#162, State#124, UDF(State#124)
AS State_CLEANED__#163, License Type#125, UDF(License Type#125) AS License Type_CLEANED__#164,
Summons Number#126, Issue Date#127, cast(Issue Date#127 as double) AS Issue Date_CLEANED__#165,
Violation Time#128, UDF(Violation Time#128) AS Violation Time_CLEANED__#166, Violation#129,
UDF(Violation#129) AS Violation_CLEANED__#167, Judgment Entry Date#130, cast(Judgment Entry
Date#130 as double) AS Judgment Entry Date_CLEANED__#168, Fine Amount#131, cast(Fine Amount#131
as double) AS Fine Amount_CLEANED__#169, Penalty Amount#132, cast(Penalty Amount#132 as double)
AS Penalty Amount_CLEANED__#170, Interest Amount#133, Reduction Amount#134, Payment Amount#135,
Amount Due#136, Precinct#137, ... 9 more fields]
>                                                                                     
        +- Project [Plate#6 AS Plate#123, State#7 AS State#124, License Type#8 AS License
Type#125, Summons Number#9 AS Summons Number#126, Issue Date#10 AS Issue Date#127, Violation
Time#11 AS Violation Time#128, Violation#12 AS Violation#129, Judgment Entry Date#13 AS Judgment
Entry Date#130, Fine Amount#14 AS Fine Amount#131, Penalty Amount#15 AS Penalty Amount#132,
Interest Amount#16 AS Interest Amount#133, Reduction Amount#17 AS Reduction Amount#134, Payment
Amount#18 AS Payment Amount#135, Amount Due#19 AS Amount Due#136, Precinct#20 AS Precinct#137,
County#21 AS County#138, Issuing Agency#22 AS Issuing Agency#139, Violation Status#23 AS Violation
Status#140, columnBasedOnManyCols#43 AS columnBasedOnManyCols#141]
>                                                                                     
           +- Project [Plate#6, State#7, License Type#8, Summons Number#9, Issue Date#10,
Violation Time#11, Violation#12, Judgment Entry Date#13, Fine Amount#14, Penalty Amount#15,
Interest Amount#16, Reduction Amount#17, Payment Amount#18, Amount Due#19, Precinct#20, County#21,
Issuing Agency#22, Violation Status#23, cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6,
State#7), License Type#8), UDF(Summons Number#9)), UDF(Issue Date#10)), Violation Time#11),
Violation#12), UDF(Judgment Entry Date#13)), UDF(Fine Amount#14)), UDF(Penalty Amount#15)),
UDF(Interest Amount#16)) as string) AS columnBasedOnManyCols#43]
>                                                                                     
              +- Relation[Plate#6,State#7,License Type#8,Summons Number#9,Issue Date#10,Violation
Time#11,Violation#12,Judgment Entry Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction
Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing Agency#22,Violation
Status#23] csv
> {code}	



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message