systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Thomas <>
Subject Passing a CoordinateMatrix to SystemML
Date Thu, 21 Dec 2017 23:00:05 GMT
Hi SystemML folks,

I'm trying to pass some data from Spark to a DML script via the MLContext
API. The data is derived from a parquet file containing a dataframe with
the schema: [label: Integer, features: SparseVector]. I am doing the

        val input_data =
        val x ="features")
        val y ="y")
        val x_meta = new MatrixMetadata(DF_VECTOR)
        val y_meta = new MatrixMetadata(DF_DOUBLES)
        val script = dmlFromFile(s"${script_path}/script.dml").
                in("X", x, x_meta).
                in("Y", y, y_meta)

However, this results in an error from SystemML:
java.lang.ArrayIndexOutOfBoundsException: 0
I'm guessing this has something to do with SparkML being zero indexed and
SystemML being 1 indexed. Is there something I should be doing differently
here? Note that I also tried converting the dataframe to a CoordinateMatrix
and then creating an RDD[String] in IJV format. That too resulted in
"ArrayIndexOutOfBoundsExceptions." I'm guessing there's something simple
I'm doing wrong here, but I haven't been able to figure out exactly what.
Please let me know if you need more information (I can send along the full
error stacktrace if that would be helpful)!



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message