reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Weimer (JIRA)" <>
Subject [jira] [Assigned] (REEF-424) Add Iterative Map-Reduce-Update
Date Thu, 09 Nov 2017 23:27:00 GMT


Markus Weimer reassigned REEF-424:

    Assignee:     (was: Markus Weimer)

> Add Iterative Map-Reduce-Update
> -------------------------------
>                 Key: REEF-424
>                 URL:
>             Project: REEF
>          Issue Type: New Feature
>          Components: IMRU, REEF.NET
>            Reporter: Markus Weimer
> Many popular machine learning algorithms can be expressed in what's known as the statistical
query model (SQM): They rely on aggregate statistics, not random data access. In the most
common case, those statistics are aggregates of functions applied to each dataset. Such queries
map trivially to the map-reduce programming paradigm.
> However, most ML algorithms perform many of such queries in iterations. This leads to
inefficiencies on traditional map-reduce systems: Ech query turns into a job which needs to
be scheduled, its input needs to be read and its output needs to be persisted.
> We propose Iterative Map Reduce Update (IMRU), a simple extension to the map-reduce abstraction
to capture such programs in three functions:
>   * {{TMapOutput Map(TMapInput input)}} is a map function with side information. It is
assumed to have access to the training data through other means, and the {{input}} provided
is the mutable state of the computation provided by the {{Update}} function.
>   * {{TMapOutput Reduce(param TMapOutput[] mapoutput}} is a (pure) reduce function.
>   * {{Tuple<TMapInput,TResult> Update(TMapOutput mapoutput)}} takes the (aggregated)
outputs from the Map functions and produces a new set of inputs for them, a result of the
computation or both. Computation terminates if no further {{TMapInput}} is produced.
> As part of this work, we will introduce the IMRU API, a local (threaded) test harness
as well as an implementation on top of REEF. Actually getting the data into the mappers is
out of scope here and will be part of another JIRA.
> This JIRA serves as an umbrella for work leading to an IMRU implementation on REEF.

This message was sent by Atlassian JIRA

View raw message