systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niketan Pansare (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SYSTEMML-880) Push-down loop structures in Python DSL
Date Tue, 10 Jul 2018 21:55:00 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539267#comment-16539267
] 

Niketan Pansare edited comment on SYSTEMML-880 at 7/10/18 9:54 PM:
-------------------------------------------------------------------

Pushdown of loop avoids invocation overhead and also enables additional optimization. Here
is a simple pyspark script that demonstrates the overhead:

 
{code:java}
from systemml import MLContext, dml
import numpy as np
import time
numpyX = np.ones((10000,100))
ml = MLContext(sc)

# Execute with pushdown of loop
script_with_loop = dml('s = 0; for(i in 1:1000) { s = s + sum(X); } ')
t0 = time.time()
ml.execute(script_with_loop.input(X=numpyX).output('s')).get('s')
print('Total time with loop:' +  str(time.time()-t0))
# Total time with loop:2.50334095955

# Execute without pushdown of loop
pythonS = 0
totalTime = 0
script_without_loop = dml('s = s + sum(X)').input(X=numpyX).output('s')
for i in range(1000):
    t0 = time.time()
    pythonS = ml.execute(script_without_loop.input(s=pythonS)).get('s')
    totalTime = totalTime + time.time()-t0

print('Total time without loop:' +  str(totalTime))
# Total time without loop:1008.73590732
{code}
 

One way to go about doing this is to define the boundaries using a decorator (for example:
parallelize) and try by first supporting simple expression and a loop structure.

 

Few related links:

[https://greentreesnakes.readthedocs.io/en/latest/nodes.html#control-flow]

[https://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts]

 


was (Author: niketanpansare):
Pushdown of loop avoids invocation overhead and also enables additional optimization. Here
is a simple pyspark script that demonstrates the overhead:

 
{code:java}
from systemml import MLContext, dml
import numpy as np
import time
numpyX = np.ones((10000,100))
ml = MLContext(sc)

# Execute with pushdown of loop
script_with_loop = dml('s = 0; for(i in 1:1000) { s = s + sum(X); } ')
t0 = time.time()
ml.execute(script_with_loop.input(X=numpyX).output('s')).get('s')
print('Total time with loop:' +  str(time.time()-t0))
# Total time with loop:2.50334095955

# Execute without pushdown of loop
pythonS = 0
totalTime = 0
script_without_loop = dml('s = s + sum(X)').input(X=numpyX).output('s')
for i in range(1000):
    t0 = time.time()
    pythonS = ml.execute(script_without_loop.input(s=pythonS)).get('s')
    totalTime = totalTime + time.time()-t0

print('Total time without loop:' +  str(totalTime))
# Total time without loop:1008.73590732
{code}
 

One way to go about doing this is to define the boundaries using a decorator (for example:
parallelize) and try by first supporting simple expression and a loop structure.

 

> Push-down loop structures in Python DSL
> ---------------------------------------
>
>                 Key: SYSTEMML-880
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-880
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Niketan Pansare
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message