spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jalendhar Baddam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21299) except is throwing the fallowing exception after perform dropDuplicates on the Dataset object
Date Wed, 05 Jul 2017 05:50:02 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074279#comment-16074279
] 

jalendhar Baddam commented on SPARK-21299:
------------------------------------------

1.First Load the data into from any data source like csv or text or rdbms into DataFrame object.
2.perform dropDuplicates on that Dataset by taking the any of the column in that dataset object
    Dataset<Row> ds=ds.dropDuplicates("col1");
3.Then take the some of the rows from the ds using ds1=ds.limit(10);
4.Perform the except operation on the actual ds i.2 ds=ds.except(ds1)// here you will get
the exception



> except is throwing the fallowing exception after perform dropDuplicates on the Dataset
object
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21299
>                 URL: https://issues.apache.org/jira/browse/SPARK-21299
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.1.0
>         Environment: spark 2.1.0
>            Reporter: jalendhar Baddam
>
> INFO: org.apache.spark.sql.AnalysisException: resolved attribute(s) test_customer_CustID#569
missing from test_customer_ROW_NUM#589L,test_customer_CustID#590,test_customer_Telephone#598L,test_customer_HouseholdID#593,test_customer_Gender#592,test_customer_Title#599,test_customer_Surname#597,test_customer_Occupation#596,test_customer_DOB#591,test_customer_Initials#595,test_customer_Income#594
in operator !Filter (cast(test_customer_CustID#569 as double) > cast(1000 as double));;
> INFO: Except
> INFO: :- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214,
test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218,
test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222]
> INFO: :  +- Sort [test_customer_ROW_NUM#212L ASC NULLS FIRST], true
> INFO: :     +- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214,
test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218,
test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222]
> INFO: :        +- SubqueryAlias 1922a657-80bd-41a5-8e1f-04a248263e47
> INFO: :           +- Aggregate [test_customer_ROW_NUM#212L, test_customer_CustID#213,
test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217,
test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L,
test_customer_Title#222], [test_customer_ROW_NUM#212L, test_customer_CustID#213, test_customer_DOB#214,
test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217, test_customer_Initials#218,
test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L, test_customer_Title#222]
> INFO: :              +- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213,
test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217,
test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L,
test_customer_Title#222]
> INFO: :                 +- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213,
test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217,
test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L,
test_customer_Title#222]
> INFO: :                    +- Aggregate [test_customer_Gender#215], [first(test_customer_ROW_NUM#212L,
false) AS test_customer_ROW_NUM#212L, first(test_customer_CustID#213, false) AS test_customer_CustID#213,
first(test_customer_DOB#214, false) AS test_customer_DOB#214, test_customer_Gender#215, first(test_customer_HouseholdID#216,
false) AS test_customer_HouseholdID#216, first(test_customer_Income#217, false) AS test_customer_Income#217,
first(test_customer_Initials#218, false) AS test_customer_Initials#218, first(test_customer_Occupation#219,
false) AS test_customer_Occupation#219, first(test_customer_Surname#220, false) AS test_customer_Surname#220,
first(test_customer_Telephone#221L, false) AS test_customer_Telephone#221L, first(test_customer_Title#222,
false) AS test_customer_Title#222]
> INFO: :                       +- Project [test_customer_ROW_NUM#212L, test_customer_CustID#213,
test_customer_DOB#214, test_customer_Gender#215, test_customer_HouseholdID#216, test_customer_Income#217,
test_customer_Initials#218, test_customer_Occupation#219, test_customer_Surname#220, test_customer_Telephone#221L,
test_customer_Title#222]
> INFO: :                          +- Filter (cast(test_customer_CustID#213 as double)
> cast(1000 as double))
> INFO: :                             +- Project [ROW_NUM#47L AS test_customer_ROW_NUM#212L,
CustID#48 AS test_customer_CustID#213, DOB#49 AS test_customer_DOB#214, Gender#50 AS test_customer_Gender#215,
HouseholdID#51 AS test_customer_HouseholdID#216, Income#52 AS test_customer_Income#217, Initials#53
AS test_customer_Initials#218, Occupation#54 AS test_customer_Occupation#219, Surname#55 AS
test_customer_Surname#220, Telephone#56L AS test_customer_Telephone#221L, Title#57 AS test_customer_Title#222]
> INFO: :                                +- SubqueryAlias customer
> INFO: :                                   +- Relation[ROW_NUM#47L,CustID#48,DOB#49,Gender#50,HouseholdID#51,Income#52,Initials#53,Occupation#54,Surname#55,Telephone#56L,Title#57]
parquet
> INFO: +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570,
test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573,
test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:    +- GlobalLimit 0
> INFO:       +- LocalLimit 0
> INFO:          +- Sort [test_customer_ROW_NUM#568L ASC NULLS FIRST], true
> INFO:             +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570,
test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573,
test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:                +- SubqueryAlias 1922a657-80bd-41a5-8e1f-04a248263e47
> INFO:                   +- Aggregate [test_customer_ROW_NUM#568L, test_customer_CustID#569,
test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572,
test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L,
test_customer_Title#577], [test_customer_ROW_NUM#568L, test_customer_CustID#569, test_customer_DOB#570,
test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572, test_customer_Initials#573,
test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L, test_customer_Title#577]
> INFO:                      +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569,
test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572,
test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L,
test_customer_Title#577]
> INFO:                         +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569,
test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572,
test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L,
test_customer_Title#577]
> INFO:                            +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569,
test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572,
test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L,
test_customer_Title#577]
> INFO:                               +- Aggregate [test_customer_Gender#592], [first(test_customer_ROW_NUM#568L,
false) AS test_customer_ROW_NUM#568L, first(test_customer_CustID#569, false) AS test_customer_CustID#569,
first(test_customer_DOB#570, false) AS test_customer_DOB#570, test_customer_Gender#592, first(test_customer_HouseholdID#571,
false) AS test_customer_HouseholdID#571, first(test_customer_Income#572, false) AS test_customer_Income#572,
first(test_customer_Initials#573, false) AS test_customer_Initials#573, first(test_customer_Occupation#574,
false) AS test_customer_Occupation#574, first(test_customer_Surname#575, false) AS test_customer_Surname#575,
first(test_customer_Telephone#576L, false) AS test_customer_Telephone#576L, first(test_customer_Title#577,
false) AS test_customer_Title#577]
> INFO:                                  +- Project [test_customer_ROW_NUM#568L, test_customer_CustID#569,
test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572,
test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L,
test_customer_Title#577]
> INFO:                                     +- !Project [test_customer_ROW_NUM#568L, test_customer_CustID#569,
test_customer_DOB#570, test_customer_Gender#592, test_customer_HouseholdID#571, test_customer_Income#572,
test_customer_Initials#573, test_customer_Occupation#574, test_customer_Surname#575, test_customer_Telephone#576L,
test_customer_Title#577]
> INFO:                                        +- !Filter (cast(test_customer_CustID#569
as double) > cast(1000 as double))
> INFO:                                           +- Project [ROW_NUM#47L AS test_customer_ROW_NUM#589L,
CustID#48 AS test_customer_CustID#590, DOB#49 AS test_customer_DOB#591, Gender#50 AS test_customer_Gender#592,
HouseholdID#51 AS test_customer_HouseholdID#593, Income#52 AS test_customer_Income#594, Initials#53
AS test_customer_Initials#595, Occupation#54 AS test_customer_Occupation#596, Surname#55 AS
test_customer_Surname#597, Telephone#56L AS test_customer_Telephone#598L, Title#57 AS test_customer_Title#599]
> INFO:                                              +- SubqueryAlias customer
> INFO:                                                 +- Relation[ROW_NUM#47L,CustID#48,DOB#49,Gender#50,HouseholdID#51,Income#52,Initials#53,Occupation#54,Surname#55,Telephone#56L,Title#57]
parquet
> INFO: 
> INFO: 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:57)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:337)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:128)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:127)
> INFO: 	at scala.collection.immutable.List.foreach(List.scala:381)
> INFO: 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67)
> INFO: 	at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:57)
> INFO: 	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:48)
> INFO: 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
> INFO: 	at org.apache.spark.sql.Dataset.withSetOperator(Dataset.scala:2834)
> INFO: 	at org.apache.spark.sql.Dataset.except(Dataset.scala:1652)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message