spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zml张明磊 <mingleizh...@Ctrip.com>
Subject 答复: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?
Date Fri, 25 Dec 2015 08:07:40 GMT
Yes. It’s a good method . But UDF ? What is UDF ?  U……………..D……………F ?
 OK, I can learn from it.

Thanks,
Minglei.

发件人: Jeff Zhang [mailto:zjffdu@gmail.com]
发送时间: 2015年12月25日 16:00
收件人: zml张明磊
抄送: dev@spark.apache.org
主题: Re: 答复: How can I get the column data based on specific column name and then stored
these data in array or list ?

You can use udf to convert one column for array type. Here's one sample


val conf = new SparkConf().setMaster("local[4]").setAppName("test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
import sqlContext._
sqlContext.udf.register("f", (a:String) => Array(a,a))
val df1 = Seq(
  (1, "jeff", 12),
  (2, "andy", 34),
  (3, "pony", 23),
  (4, "jeff", 14)
).toDF("id", "name", "age")

val df2=df1.withColumn("name", expr("f(name)"))
df2.printSchema()
df2.show()

On Fri, Dec 25, 2015 at 3:44 PM, zml张明磊 <mingleizhang@ctrip.com<mailto:mingleizhang@ctrip.com>>
wrote:
Thanks, Jeff. It’s not choose some columns of a Row. It’s just choose all data in a column
and convert it to an Array. Do you understand my mean ?

In Chinese
我是想基于这个列名把这一列中的所有数据都选出来,然后放到数组里面去。


发件人: Jeff Zhang [mailto:zjffdu@gmail.com<mailto:zjffdu@gmail.com>]
发送时间: 2015年12月25日 15:39
收件人: zml张明磊
抄送: dev@spark.apache.org<mailto:dev@spark.apache.org>
主题: Re: How can I get the column data based on specific column name and then stored these
data in array or list ?

Not sure what you mean. Do you want to choose some columns of a Row and convert it to an Arrray
?

On Fri, Dec 25, 2015 at 3:35 PM, zml张明磊 <mingleizhang@ctrip.com<mailto:mingleizhang@ctrip.com>>
wrote:

Hi,

       I am a new to Scala and Spark and trying to find relative API in DataFrame to solve
my problem as title described. However, I just only find this API DataFrame.col(colName :
String) : Column which returns an object of Column. Not the content. If only DataFrame support
such API which like Column.toArray : Type is enough for me. But now, it doesn’t. How can
I do can achieve this function ?

Thanks,
Minglei.



--
Best Regards

Jeff Zhang



--
Best Regards

Jeff Zhang
Mime
View raw message