reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Chung (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (REEF-580) Add a Block Management Service to REEF
Date Mon, 06 Jun 2016 22:54:21 GMT

     [ https://issues.apache.org/jira/browse/REEF-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Chung updated REEF-580:
------------------------------
    Assignee:     (was: Andrew Chung)

> Add a Block Management Service to REEF
> --------------------------------------
>
>                 Key: REEF-580
>                 URL: https://issues.apache.org/jira/browse/REEF-580
>             Project: REEF
>          Issue Type: New Feature
>          Components: REEF-IO, REEF.NET IO
>            Reporter: Markus Weimer
>         Attachments: REEF Block Management Design.docx
>
>
> We propose the addition of a data Block Management service to REEF. The Block Manager
manages the transient data of a Big Data application. The Block Manager assumes that transient
data can be managed in the following hierarchy:
>   * *Data Set:* A data set consists of a set of (physical)n partitions. For instance,
a folder on HDFS could be considered a data set, while its files constitute the partitions.
>   * *Partition:* a physical partition of a data set. In the example above, it would be
a file. Partitions consist of Blocks.
>   * *Block:* The atomic unit of data management. Each block belongs to exactly one partition.
Blocks are immutable. Blocks can be stored in Evaluator memory, on local Disk or stable, distributed
storage. Blocks can have replicas across these memory tiers. Blocks contain data of arbitrary
format. From the perspective of this Block Management service, they are large, fixed sized
byte arrays.
> The purpose of the Block Manager is to manage the metadata and movement of data sets
organized in such a way. To facilitate that, each Block, Partition and DataSet has a unique
ID.
> On the *Task side*, the Block Manager facilitates the retrieval of and access to any
Block or Partition by their ID. Specific access methods are yet to be designed (e.g. whether
or not there is an order to the blocks). Also, new Blocks can be created on the Task side
for a given Partition. Special consideration shall be given to the memory allocation efficiency
of this operation.
> On the *Driver side*, the Block Manager keeps track of the metadata of all Blocks. It
provides a network protocol used by the Task side components to retrieve and update metadata
records. Metadata can be kept in memory or, in a later version, in stable storage such as
a SQL database.
> The Block Management service shall be built in a language and platform agnostic manner.
At the very least, the Driver side network protocol needs to be accessible by both JVM and
CLR implementations of the Task side. REST could be an appropriate approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message