spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Verma, Rishi (398J)" <>
Subject Converting a DStream's RDDs to SchemaRDDs
Date Thu, 28 Aug 2014 17:28:54 GMT
Hi Folks,

I’d like to find out tips on how to convert the RDDs inside a Spark Streaming DStream to
a set of SchemaRDDs.

My DStream contains JSON data pushed over from Kafka, and I’d like to use SparkSQL’s JSON
import function (i.e. jsonRDD) to register the JSON dataset as a table, and perform queries
on it.

Here’s a code snippet of my latest attempt (in Scala):
val sc = new SparkContext(conf)
val ssc = new StreamingContext("local", this.getClass.getName, Seconds(1))

val stream = KafkaUtils.createStream(ssc, "localhost:2181", “group", Map(“topic" ->
val sql = new SQLContext(sc)

stream.foreachRDD(rdd => {
	if (rdd.count > 0) {
		// message received
		val sqlRDD = sql.jsonRDD(rdd)
	} else {
		println(“No message received")

This compiles and runs when I submit it to Spark (local-mode); however, I never seem to be
able to successfully see a schema printed on my console, via the “sqlRDD.printSchema()”
method when Kafka is streaming my JSON messages to the “topic” topic name. I know my JSON
is valid and my Kafka connection works fine, I’ve been able to print the stream messages
in their raw format, just not as SchemaRDDs.  

Any tips? Suggestions?

Thanks much,
Rishi Verma
NASA Jet Propulsion Laboratory
California Institute of Technology

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message