spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Wood (JIRA)" <>
Subject [jira] [Created] (SPARK-14534) Should SparkContext.parallelize(List) take an Iterable instead?
Date Mon, 11 Apr 2016 13:28:25 GMT
David Wood created SPARK-14534:

             Summary: Should SparkContext.parallelize(List) take an Iterable instead?
                 Key: SPARK-14534
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.6.1
            Reporter: David Wood
            Priority: Minor

I am using MongoDB to read the DB and it provides an Iterable (and not a List) to access the
results.  This is similar to the ResultSet in SQL and is done this way so that you can process
things row by row and not have to pull in a potentially large DB all at once.  It might be
nice if parallelize(List) could instead operate on an Iterable to allow a similar efficience.
  SInce a List is an Iterable, this would would be backwards compatible.  However, I'm new
to Spark so not sure if that might violate some other design point.  

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message