systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <>
Subject [jira] [Commented] (SYSTEMML-2396) Batch pre-fetching per workers
Date Fri, 15 Jun 2018 05:17:00 GMT


Matthias Boehm commented on SYSTEMML-2396:

In principle yes, but it seems that the description currently intermixes two things: (1) the
order of batch slicing, and (2) interleaving of compute and slicing.
* Order of batch slicing: Currently we perform pull (blocking), slice, and compute. A simple
approach to reduce the waiting time is to perform slice, pull (blocking), compute. If we would
wait a while on pull this can hide the slice overhead.
* Interleaving: Additionally we could interleave computation and slicing of the next batch
by using double buffering or in general a blocking queue for n batches (and yes with a dedicated
prefetch thread).

While (1) is generally a good idea and does not introduce complexity, for (2) we need to see
the experimental results because it would add complexity to the design. Please run a couple
of local experiments with your new stats output and investigate the slicing of dense and sparse

> Batch pre-fetching per workers
> ------------------------------
>                 Key: SYSTEMML-2396
>                 URL:
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
> This task aims to improve the performance of workers. Currently, in each iteration of
mini-batch, we need to slice the matrix, execute the gradients computation and then send them
to the ps for updating the model. While the ps is doing the aggregation work, the worker pauses
due to waiting for the new model. Hence the idea is to completely use this free slot to pre-fetch
the mini-batch in order to accelerate the future iteration.

This message was sent by Atlassian JIRA

View raw message