spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thanigai Vellore <thanigai.vell...@gmail.com>
Subject Re: Spark Streaming - Collecting RDDs into array in the driver program
Date Thu, 26 Feb 2015 02:24:07 GMT
I didn't include the complete driver code but I do run the streaming
context from the main program which calls this function. Again, I can print
the red elements within the foreachrdd block but the array that is returned
is always empty. It appears that the function immediately returns even
before the foreachrdd stage is executed. Is that possible?
On Feb 25, 2015 5:41 PM, "Tathagata Das" <tdas@databricks.com> wrote:

> You are just setting up the computation here using foreacRDD. You have not
> even run the streaming context to get any data.
>
>
> On Wed, Feb 25, 2015 at 2:21 PM, Thanigai Vellore <
> thanigai.vellore@gmail.com> wrote:
>
>> I have this function in the driver program which collects the result from
>> rdds (in a stream) into an array and return. However, even though the RDDs
>> (in the dstream) have data, the function is returning an empty array...What
>> am I doing wrong?
>>
>> I can print the RDD values inside the foreachRDD call but the array is
>> always empty.
>>
>> def runTopFunction() : Array[(String, Int)] = {
>>         val topSearches = some function....
>>         val summary = new ArrayBuffer[(String,Int)]()
>>         topSearches.foreachRDD(rdd => {
>>             summary = summary.++(rdd.collect())
>>         })
>>
>>     return summary.toArray
>> }
>>
>>
>

Mime
View raw message