spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soumitra Kumar <kumar.soumi...@gmail.com>
Subject Re: Spark Streaming: DStream - zipWithIndex
Date Thu, 28 Aug 2014 19:48:48 GMT
Yes, that is an option.

I started with a function of batch time, and index to generate id as long. This may be faster
than generating UUID, with added benefit of sorting based on time.

----- Original Message -----
From: "Tathagata Das" <tathagata.das1565@gmail.com>
To: "Soumitra Kumar" <kumar.soumitra@gmail.com>
Cc: "Xiangrui Meng" <mengxr@gmail.com>, user@spark.apache.org
Sent: Thursday, August 28, 2014 2:19:38 AM
Subject: Re: Spark Streaming: DStream - zipWithIndex


If just want arbitrary unique id attached to each record in a dstream (no ordering etc), then
why not create generate and attach an UUID to each record? 





On Wed, Aug 27, 2014 at 4:18 PM, Soumitra Kumar < kumar.soumitra@gmail.com > wrote:




I see a issue here. 


If rdd.id is 1000 then rdd.id * 1e9.toLong would be BIG. 


I wish there was DStream mapPartitionsWithIndex. 





On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng < mengxr@gmail.com > wrote: 


You can use RDD id as the seed, which is unique in the same spark 
context. Suppose none of the RDDs would contain more than 1 billion 
records. Then you can use 

rdd.zipWithUniqueId().mapValues(uid => rdd.id * 1e9.toLong + uid) 

Just a hack .. 

On Wed, Aug 27, 2014 at 2:59 PM, Soumitra Kumar 


< kumar.soumitra@gmail.com > wrote: 
> So, I guess zipWithUniqueId will be similar. 
> 
> Is there a way to get unique index? 
> 
> 
> On Wed, Aug 27, 2014 at 2:39 PM, Xiangrui Meng < mengxr@gmail.com > wrote: 
>> 
>> No. The indices start at 0 for every RDD. -Xiangrui 
>> 
>> On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar 
>> < kumar.soumitra@gmail.com > wrote: 
>> > Hello, 
>> > 
>> > If I do: 
>> > 
>> > DStream transform { 
>> > rdd.zipWithIndex.map { 
>> > 
>> > Is the index guaranteed to be unique across all RDDs here? 
>> > 
>> > } 
>> > } 
>> > 
>> > Thanks, 
>> > -Soumitra. 
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message