spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Madhu <>
Subject RDD data flow
Date Tue, 16 Dec 2014 17:09:54 GMT
I was looking at some of the Partition implementations in core/rdd and
getOrCompute(...) in CacheManager.
It appears that getOrCompute(...) returns an InterruptibleIterator, which
delegates to a wrapped Iterator.
That would imply that Partitions should extend Iterator, but that is not
always the case.
For example, these Partitions for these RDDs do not extend Iterator:

Why is that? Shouldn't all Partitions be Iterators? Clearly I'm missing

On a related subject, I was thinking of documenting the data flow of RDDs in
more detail. The code is not hard to follow, but it's nice to have a simple
picture with the major components and some explanation of the flow.  The
declaration of Partition is throwing me off.


View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message