reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joo Seong (Jason) Jeong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1479) Define interface for distributed dataset
Date Mon, 11 Jul 2016 21:02:11 GMT

    [ https://issues.apache.org/jira/browse/REEF-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371612#comment-15371612
] 

Joo Seong (Jason) Jeong commented on REEF-1479:
-----------------------------------------------

Thanks, that would be much better than doing it this way! I'll make a PR soon.

> Define interface for distributed dataset 
> -----------------------------------------
>
>                 Key: REEF-1479
>                 URL: https://issues.apache.org/jira/browse/REEF-1479
>             Project: REEF
>          Issue Type: Sub-task
>          Components: REEF.NET
>            Reporter: Joo Seong (Jason) Jeong
>
> As a first step of [REEF-1477|https://issues.apache.org/jira/browse/REEF-1477], we'd
like to define an interface for the distributed dataset that we will work with. This dataset
interface serves as an abstraction of many dataset partitions, one on each Evaluator. In some
sense, the class {{IPartitionedInputDataSet}} is very similar to what we want, except that
the new interface will contain action methods like {{RunIMRU}} or {{RunTransform}}.
> {code}
> interface IDataSet<T> {
>   // apply a transform to this dataset
>   // transformConf gets shipped to each partition
>   // partition-wise operation
>   IDataSet<T'> TransformPartitions(IConfiguration transformConf);
>   // general interface for applying operations
>   // aware of all partitions, compared to TransformPartitions()
>   IDataSet<T'> RunStage(IConfiguration stageConf);
>   // fetch the actual data to the local process
>   // may result in OutOfMemory exception if T is too large
>   T[] Collect();
> }
> {code}
> Writing the data on stable storage via a {{Store()}} method can be considered as a special
case of {{RunStage()}}. On the other hand, {{Load()}} must be defined in a separate interface/class
and may be dependent on the backing filesystem.
> Other interfaces that allow 'Stage' implementations and partition access from Tasks must
also be newly defined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message