systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Boehm <mboe...@googlemail.com>
Subject Re: Parfor semantics
Date Wed, 23 Nov 2016 10:12:06 GMT
well, it has been used for similar use cases. It works well if the 
dataset fits into memory of each worker. For very large datasets, the 
distributed right indexing is an issue, as it prevents us from running 
parfor itself as distributed operation. However, this can be addressed 
via block-partitioning, but so far we only support row/column partitioning.

Regards,
Matthias

On 11/23/2016 2:54 AM, dusenberrymw@gmail.com wrote:
> Also for some context, we're aiming to use this for remote hyperparameter tuning over
a large dataset.  Specifically, each remote process would train a separate model over the
full dataset using a mini-batch SGD approach.  Has the `parfor` construct been used for this
purpose before?
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
>> On Nov 22, 2016, at 2:01 PM, Matthias Boehm <mboehm7@googlemail.com> wrote:
>>
>> that's a good catch - thanks Felix. It would be great if you could modify rewriteSetExecutionStategy
and rewriteSetFusedDataPartitioningExecution in OptimizerConstrained to handle the respective
Spark execution types. Thanks.
>>
>> Regards,
>> Matthias
>>
>>> On 11/22/2016 7:54 PM, fschueler@posteo.de wrote:
>>> The constrained optimizer doesn't seem to know about a REMOTE_SPARK
>>> execution mode and either sets CP or REMOTE_MR. I can open a jira for
>>> that and provide a fix.
>>>
>>> Felix
>>>
>>> Am 22.11.2016 02:07 schrieb Matthias Boehm:
>>>> yes, this came up several times - initially we only supported opt=NONE
>>>> where users had to specify all other parameters. Meanwhile, there is a
>>>> so-called "constrained optimizer" that does the same as the rule-based
>>>> optimizer but respects any given parameters. Please try something like
>>>> this:
>>>>
>>>> parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) {
>>>>     // some code here
>>>> }
>>>>
>>>>
>>>> Regards,
>>>> Matthias
>>>>
>>>>> On 11/22/2016 12:33 AM, fschueler@posteo.de wrote:
>>>>> While debugging some ParFor code it became clear that the parameters
for
>>>>> parfor can be easily overwritten by the optimizer.
>>>>> One example is when I write:
>>>>>
>>>>> ```
>>>>> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) {
>>>>>    // some code here
>>>>> }
>>>>> ```
>>>>>
>>>>> Depending on the data size and cluster resources, the optimizer
>>>>> (OptimizerRuleBased.java, line 844) will recognize that the work can
be
>>>>> done locally and overwrite it to local execution. This might be valid
>>>>> and definitely works (in my case) but kind of contradicts what I want
>>>>> SystemML to do.
>>>>> I wonder if we should disable this optimization in case a concrete
>>>>> execution mode is given and go with the mode that is provided.
>>>>>
>>>>> Felix
>>>>>
>>>>>
>>>
>>>
>

Mime
View raw message