systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niketan Pansare" <npan...@us.ibm.com>
Subject Re: API documentation for SystemML
Date Mon, 07 Dec 2015 23:31:55 GMT

Thanks Deron for your response :)

Sourav: Few additional comments:
1. MLContext allows the users to pass RDDs to SystemML and MLOutput allows
them to fetch the result RDD after the execution of a DML script.

2. MLContext exposes registerInput("variableName", RDD) interface, while
MLOutput has get..("variableName") methods. Eg: getDF,
getBinaryBlockedRDD, ...

3. With exception of DataFrame, the RDDs supported by these classes mirror
the RDDs in the symbol table and the format supported by read()/write()
built-in functions. Following types of RDDs are supported by these classes:
  a. Binary blocked RDD (JavaPairRDD<MatrixIndexes, MatrixBlock>) =>
corresponds to format="binary"
  b. String-based RDD (JavaRDD<String>) => corresponds to format="csv" or
format="text"
  c. DataFrame

See
http://apache.github.io/incubator-systemml/dml-language-reference.html#readwrite-built-in-functions
 for more details about the formats supported by read()/write() built-in
functions.

4. For all other types of RDDs, we decided to expose them through converter
utils:
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/instructions/spark/utils/RDDConverterUtils.java
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/instructions/spark/utils/RDDConverterUtilsExt.java

5. The utility functions in RDDConverterUtilsExt are not tested for
performance and robustness. Once they are tested, they will be moved into
RDDConverterUtils. Most of these utils have javadocs within the code and we
will add both usage guide and external javadoc for them. Following types of
conversions are supported by the converter utils:
  a. CoordinateMatrix to Binary blocked RDD (See
coordinateMatrixToBinaryBlock in RDDConverterUtilsExt).
  b. Binary blocked RDD to String RDD.
  c. DataFrame with a column with Vector UDT to Binary Block and viceversa.
This is useful while working with RDD<LabelPoints>. (See
vectorDataFrameToBinaryBlock and binaryBlockToVectorDataFrame  in
RDDConverterUtilsExt).
  d. DataFrame with double columns (See dataFrameToBinaryBlock  in
RDDConverterUtilsExt). Since DataFrame/RDD is a collection not a
indexed/ordered sequence (at least not at API level), an ID column is
inserted by MLOutput to denote the row index.
  e. Binary block to Labeled points (See binaryBlockToLabeledPoints  in
RDDConverterUtils).
  f. Conversion between text/cell/csv formats to and from Binary blocked
RDD (See RDDConverterUtils).

6. MLContext interface is Scala compatible i.e. we support both JavaRDD and
RDD, JavaSparkContext and SparkContext, java.util.HashMap and
scala.collection.immutable.Map, and so on.

7. MatrixCharacteristics is used to provide the metadata information (such
as number of rows, number of columns, block row length, block column length
and number of non-zeros) of a RDD to the SystemML's optimizer. In some
cases, it is required (for example: text, binary format) while in some
cases, it can be skipped (for example: csv, dataframe). MLContext exposes
convenient wrappers such as void registerInput(String varName,
JavaPairRDD<MatrixIndexes,MatrixBlock> rdd, long rlen, long clen, int brlen
, int bclen) to avoid creating MatrixCharacteristics. Here is the source
code if you are interested:
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/MatrixCharacteristics.java

A good example of using MatrixCharacteristics and converter utils is
provided in RDDConverterUtilsExt's javadoc:
	 * import
org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt
	 * import org.apache.sysml.runtime.matrix.MatrixCharacteristics
	 * import org.apache.spark.api.java.JavaSparkContext
	 * import org.apache.spark.mllib.linalg.distributed.MatrixEntry
	 * import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
	 * val matRDD = sc.textFile("ratings.text").map(_.split(" ")).map(x
=> new MatrixEntry(x(0).toLong, x(1).toLong, x(2).toDouble)).filter
(_.value != 0).cache
	 * require(matRDD.filter(x => x.i == 0 || x.j == 0).count == 0,
"Expected 1-based ratings file")
	 * val nnz = matRDD.count
	 * val numRows = matRDD.map(_.i).max
	 * val numCols = matRDD.map(_.j).max
	 * val coordinateMatrix = new CoordinateMatrix(matRDD, numRows,
numCols)
	 * val mc = new MatrixCharacteristics(numRows, numCols, 1000, 1000,
nnz)
	 * val binBlocks = RDDConverterUtilsExt.coordinateMatrixToBinaryBlock
(new JavaSparkContext(sc), coordinateMatrix, mc, true)


Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	Deron Eriksson <deroneriksson@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	12/07/2015 02:50 PM
Subject:	Re: API documentation for SystemML



Hi Sourav,

One way to generate Javadocs for the entire SystemML project is "mvn
javadoc:javadoc".

Unfortunately, classes such as MatrixCharacteristics and RDDConverterUtils
currently have very minimal API documentation. We are hoping to address
this in the near future. However, you may find that the following
documentation link could be of assistance in getting started, given your
interest in Scala:

http://apache.github.io/incubator-systemml/mlcontext-programming-guide.html

Deron


On Mon, Dec 7, 2015 at 1:58 PM, Sourav Mazumder
<sourav.mazumder00@gmail.com
> wrote:

> Hi,
>
> Is there any Scala/Java API documentation available for classes like
>
> MatrixCharacteristics, RDDConverterUtils ?
>
> What I need to understand is what all such helper utilities available
> and the deatils of their signature/APIs.
>
> Regards,
>
> Sourav
>


Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message