spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishnu Viswanath <>
Subject Adding new column to Dataframe
Date Wed, 25 Nov 2015 22:39:04 GMT

I am trying to add the row number to a spark dataframe.
This is my dataframe:

scala> df.printSchema
|-- line: string (nullable = true)

I tried to use df.withColumn but I am getting below exception.

scala> df.withColumn("row",rowNumber)
org.apache.spark.sql.AnalysisException: unresolved operator 'Project
[line#2326,'row_number() AS row#2327];
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37)
at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:174)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49)

Also, is it possible to add a column from one dataframe to another?
something like

scala> df.withColumn("line2",df2("line"))

org.apache.spark.sql.AnalysisException: resolved attribute(s)
line#2330 missing from line#2326 in operator !Project
[line#2326,line#2330 AS line2#2331];


Thanks and Regards,
Vishnu Viswanath
* <>*

View raw message