spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ChengBo <Cheng...@huawei.com>
Subject RE: Get statistic result from RDD
Date Tue, 20 Oct 2015 22:52:46 GMT
I tried, but it shows:
“error: value reduceByKey is not a member of iterable[((Int, Int, String, String), String),
Int]”

Best
Frank

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Tuesday, October 20, 2015 3:46 PM
To: ChengBo
Cc: user
Subject: Re: Get statistic result from RDD

Please take a look at:
examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala

Cheers

On Tue, Oct 20, 2015 at 3:18 PM, ChengBo <Cheng.Bo@huawei.com<mailto:Cheng.Bo@huawei.com>>
wrote:
Thanks, but I still don’t get it.
I have used groupBy to group data by userID, and for each ID, I need to get the statistic
information.

Best
Frank

From: Ted Yu [mailto:yuzhihong@gmail.com<mailto:yuzhihong@gmail.com>]
Sent: Tuesday, October 20, 2015 3:12 PM
To: ChengBo
Cc: user
Subject: Re: Get statistic result from RDD

Your mapValues can emit a tuple. If p(0) is between 0 and 5, first component of tuple would
be 1, second being 0.
If p(0) is 6 or 7, first component of tuple would be 0, second being 1.

You can use reduceByKey to sum up corresponding component.

On Tue, Oct 20, 2015 at 1:33 PM, Shepherd <Cheng.Bo@huawei.com<mailto:Cheng.Bo@huawei.com>>
wrote:
Hi all,

I am really newie in Spark and Scala.
I cannot get the statistic result from a RDD. Is someone could help me on
this?
Current code is as follows:

/import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

val webFile = sc.textFile("/home/Dataset/web_info.csv")
webFile.cache()
val webItem = webFile.map(line => line.split(","))
val webEachRDD = webItem.map(p => (p(0).toLong, p(1).toLong, p(2).toLong,
p(3), p(5))) //Int, Int, Int, String, String; p(3) here is user ID, and each
user ID wil have multiple rows.

val webGroup = webEachRDD.groupBy(_._4)

val res = webGroup.mapValues(v => {
        ....
        (wkd.count, wknd.count)
})

/How can I write the webGroup.mapValues, so that I could each user ID's
statistic information.
For example: p(0) is an int between 0 to 7.
I wish to get the result for each userID, how many 0 to 5 in p(0), and how
many 6 to 7 in p(0).
In the final result, each row represents each userID's statistic result.

Thanks a lot. I really appreciate it.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Get-statistic-result-from-RDD-tp25147.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<mailto:user-unsubscribe@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<mailto:user-help@spark.apache.org>


Mime
View raw message