[ https://issues.apache.org/jira/browse/SYSTEMML1379?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel
]
Deron Eriksson closed SYSTEMML1379.

> Investigate script metadata to simplify MLContext script interaction
> 
>
> Key: SYSTEMML1379
> URL: https://issues.apache.org/jira/browse/SYSTEMML1379
> Project: SystemML
> Issue Type: Improvement
> Components: Algorithms, APIs
> Reporter: Deron Eriksson
> Assignee: Deron Eriksson
> Fix For: Not Applicable
>
>
> Currently many scripts contain usage comments such as the following:
> {code}
> # THIS SCRIPT COMPUTES AN APPROXIMATE FACTORIZATIONOF A LOWRANK MATRIX X INTO TWO MATRICES
U AND V
> # USING ALTERNATINGLEASTSQUARES (ALS) ALGORITHM WITH CONJUGATE GRADIENT
> # MATRICES U AND V ARE COMPUTED BY MINIMIZING A LOSS FUNCTION (WITH REGULARIZATION)
> #
> # INPUT PARAMETERS:
> # 
> # NAME TYPE DEFAULT MEANING
> # 
> # X String  Location to read the input matrix X to be factorized
> # U String  Location to write the factor matrix U
> # V String  Location to write the factor matrix V
> # rank Int 10 Rank of the factorization
> # reg String "L2" Regularization:
> # "L2" = L2 regularization;
> # "wL2" = weighted L2 regularization
> # lambda Double 0.000001 Regularization parameter, no regularization if 0.0
> # maxi Int 50 Maximum number of iterations
> # check Boolean FALSE Check for convergence after every iteration, i.e., updating
U and V once
> # thr Double 0.0001 Assuming check is set to TRUE, the algorithm stops and convergence
is declared
> # if the decrease in loss in any two consecutive iterations
falls below this threshold;
> # if check is FALSE thr is ignored
> # fmt String "text" The output format of the factor matrices L and R, such as
"text" or "csv"
> # 
> # OUTPUT:
> # 1 An m x r matrix U, where r is the factorization rank
> # 2 An r x n matrix V
> #
> # HOW TO INVOKE THIS SCRIPT  EXAMPLE:
> # hadoop jar SystemML.jar f ALSCG.dml nvargs X=INPUT_DIR/X U=OUTPUT_DIR/U V=OUTPUT_DIR/V
rank=10 reg="L2" lambda=0.0001 fmt=csv
> {code}
> Comments such as these are difficult to refer to from a programmatic interactive environment
such as the Spark Shell. If similar information is provided in a parseable format, such as
JSON or XML, it can potentially be parsed and used to provide such information programmatically,
such as through the MLContext API in the Spark Shell.

This message was sent by Atlassian JIRA
(v6.4.14#64029)
