spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dusenberrymw <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-6227] [MLlib] [PySpark] Implement PySpa...
Date Fri, 07 Aug 2015 19:28:45 GMT
Github user dusenberrymw commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7963#discussion_r36552903
  
    --- Diff: python/pyspark/mllib/linalg/distributed.py ---
    @@ -352,6 +458,56 @@ def toBlockMatrix(self, rowsPerBlock=1024, colsPerBlock=1024):
                                                                colsPerBlock)
             return BlockMatrix(java_block_matrix, rowsPerBlock, colsPerBlock)
     
    +    def computeSVD(self, k, computeU=False, rCond=1e-9):
    +        """
    +        Computes the singular value decomposition of the IndexedRowMatrix.
    +
    +        The given row matrix A of dimension (m X n) is decomposed into U * s * V'T where
    +
    +        * U: (m X k) (left singular vectors) is a IndexedRowMatrix whose columns are
the
    +             eigenvectors of (A X A')
    +        * s: DenseVector consisting of square root of the eigenvalues (singular values)
    +             in descending order.
    +        * v: (n X k) (right singular vectors) is a Matrix whose columns are the
    +             eigenvectors of (A' X A)
    +
    +        For more specific details on implementation, please refer the scala documentation.
    +
    +        :param k: Set the number of singular values to keep.
    +        :param computeU: Whether of not to compute U. If set to be True, then U is computed
    +                         by A * V * s^-1
    +        :param rCond: Reciprocal condition number. All singular values smaller than
    +                      rCond * s[0] are treated as zero, where s[0] is the largest
    +                      singular value.
    +        :returns: SingularValueDecomposition object
    +
    +        >>> data = [(0, (3, 1, 1)), (1, (-1, 3, 1))]
    +        >>> irm = IndexedRowMatrix(sc.parallelize(data))
    +        >>> svd_model = irm.computeSVD(2, True)
    +        >>> svd_model.U.rows.collect() # doctest: +NORMALIZE_WHITESPACE
    +        [IndexedRow(0, [-0.707106781187,0.707106781187]),\
    +        IndexedRow(1, [-0.707106781187,-0.707106781187])]
    +        >>> svd_model.s
    +        DenseVector([3.4641, 3.1623])
    +        >>> svd_model.V
    +        DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 0.0], 0)
    +        """
    +        j_model = self._java_matrix_wrapper.call("computeSVD", int(k), computeU, float(rCond))
    --- End diff --
    
    Same as above with `computeU`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message