spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pwendell <...@git.apache.org>
Subject [GitHub] spark pull request: Spark-1163, Added missing Python RDD functions
Date Sat, 08 Mar 2014 19:33:46 GMT
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/92#discussion_r10410706
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -1057,6 +1058,64 @@ def coalesce(self, numPartitions, shuffle=False):
             jrdd = self._jrdd.coalesce(numPartitions)
             return RDD(jrdd, self.ctx, self._jrdd_deserializer)
     
    +    def name(self):
    +        """
    +        Return the name of this RDD.
    +        """
    +        name_ = self._jrdd.name()
    +        if not name_:
    +            return None
    +        return name_.encode('utf-8')
    +
    +    def setName(self, name):
    +        """
    +        Assign a name to this RDD.
    +        >>> rdd1 = sc.parallelize([1,2])
    +        >>> rdd1.setName('RDD1')
    +        >>> rdd1.name()
    +        'RDD1'
    +        """
    +        self._jrdd.setName(name)
    +
    +    def generator(self):
    +        """
    +        Return the generator of this RDD.
    +        """
    +        generator_ = self._jrdd.generator()
    +        if not generator_:
    +            return None
    +        return generator_.encode('utf-8')
    +
    +    def setGenerator(self, generator):
    +        """
    +        Reset generator of this RDD.
    +        >>> rdd1 = sc.parallelize([1,2])
    +        >>> rdd1.setGenerator('dummyRDDgenerator')
    +        >>> rdd1.generator()
    +        'dummyRDDgenerator'
    +        """
    +        self._jrdd.setGenerator(generator)
    +
    +    def toDebugString(self):
    +        """
    +        A description of this RDD and its recursive dependencies for debugging.
    +        """
    +        debug_string = self._jrdd.toDebugString()
    +        if not debug_string:
    +            return None
    +        return debug_string.encode('utf-8')
    +
    +    def getStorageLevel(self):
    --- End diff --
    
    Would you mind adding a `__repr__` function to the `StorageLevel` class so the user can
print the return type of this nicely:
    
    ```
        def __repr__(self):
            return "StorageLevel(%s, %s, %s, %s)" % (
                self.useDisk, self.useMemory, self.deserialized, self.replication)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message