reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joo Seong (Jason) Jeong (JIRA)" <>
Subject [jira] [Created] (REEF-1479) Define interface for distributed dataset
Date Thu, 30 Jun 2016 20:15:10 GMT
Joo Seong (Jason) Jeong created REEF-1479:

             Summary: Define interface for distributed dataset 
                 Key: REEF-1479
             Project: REEF
          Issue Type: Sub-task
            Reporter: Joo Seong (Jason) Jeong

As a first step of [REEF-1477|], we'd like
to define an interface for the distributed dataset that we will work with. This dataset interface
serves as an abstraction of many dataset partitions, one on each Evaluator. In some sense,
the class {{IPartitionedInputDataSet}} is very similar to what we want, except that the new
interface will contain action methods like {{RunIMRU}} or {{RunTransform}}.

interface IDataSet<T> {
  // apply a transform to this dataset
  IDataSet<T'> RunTransform(Transform<T, T'> transform);

  // run an IMRU job on this dataset and get some results
  T' RunIMRU(IMRUConfiguration<T, T'> imruConfiguration);

  // store this dataset to some destination
  void Store(URI uri);

interface IDataSetLoader<T> {
  // generate a dataset from some source
  IDataSet<T> Load(URI uri);

This message was sent by Atlassian JIRA

View raw message