spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Adhikary <adhikarysaur...@gmail.com>
Subject Dataframe replace 'collect()' going in indefinite time loop
Date Wed, 03 May 2017 20:23:43 GMT
final_schema_noise_data =
sqlContext.createDataFrame(noise_data_parts,noise_data_struct_schema)

for a_name in name_field_names:
 
final_schema_noise_data=final_schema_noise_data.withColumn(a_name,spaceDeleteUDF(a_name))
#--- till here final_schema_noise_data.collect() is working---
  for t in noise_chars:
    final_schema_noise_data =
final_schema_noise_data.na.replace(t,'',a_name)
    print a_name,t
#The above loop gets completed but final_schema_noise_data.collect() dos not
yield any result, cursor goes to next line & some processing goes on for
hours but no output.

#Before the inner for loop , the df.collect() gives output in secs & post
completion of the loop no output for hours. 
*Any known issue with the df.na.replace function ??*



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Dataframe-replace-collect-going-in-indefinite-time-loop-tp21492.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message