spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mj <jone...@gmail.com>
Subject Re: Appending an incrental value to each RDD record
Date Tue, 16 Dec 2014 16:01:45 GMT
You could try using zipWIthIndex (links below to API docs). For example, in
python:

items =['a','b','c']
items2= sc.parallelize(items)

print(items2.first())

items3=items2.map(lambda x: (x, x+"!"))

print(items3.first())

items4=items3.zipWithIndex()

print(items4.first())

items5=items4.map(lambda x: (x[1], x[0]))
print(items5.first())


This will give you an output of (0, ('a', 'a!')) - where the 0 is the index.
You could also use a map to increment them up by a value (e.g. if you wanted
to count from 1).

Links
http://spark.apache.org/docs/latest/api/python/index.html
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Appending-an-incrental-value-to-each-RDD-record-tp20718p20720.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message