systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arijit chakraborty <ak...@hotmail.com>
Subject Improve SystemML execution speed in Spark
Date Wed, 10 May 2017 17:31:09 GMT
Hi,


I'm creating a process in SystemML, and running it through spark. I'm running the code in
the following way:


# Spark Specifications:


import os
import sys
import pandas as pd
import numpy as np

spark_path = "C:\spark"
os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path

sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip")

from pyspark import SparkContext
from pyspark import SparkConf

sc = SparkContext("local[*]", "test")


# SystemML Specifications:


from pyspark.sql import SQLContext
import systemml as sml
sqlCtx = SQLContext(sc)
ml = sml.MLContext(sc)


# Importing the data


train_data= pd.read_csv("data1.csv")
test_data     = pd.read_csv("data2.csv")



train_data = sqlCtx.createDataFrame(pd.DataFrame(train_data))
test_data  = sqlCtx.createDataFrame(pd.DataFrame(test_data))


# Finally executing the code:


scriptUrl = "C:/systemml-0.13.0-incubating-bin/scripts/model_code.dml"

script = sml.dml(scriptUrl).input(bdframe_train =train_data , bdframe_test = test_data).output("check_func")

beta = ml.execute(script).get("check_func").toNumPy()

pd.DataFrame(beta).head(1)

The datasize are 1000 & 100 rows for train and test respectively. I'm testing it on small
dataset during development. Later will test in larger dataset. I'm running on my local system
with 4 cores.

The problem is, if I run the model in R, it's taking fraction of second. But when I'm running
like this, it's taking around 20-30 seconds.

Could anyone please suggest me how to improve the execution speed? In case there are any other
way I can execute the code, which can improve the execution speed.

Also, thank you all you guyz for releasing the 0.14 version. There are fewimprovements  we
found extremely helpful.

Thank you!
Arijit


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message