Return-Path: X-Original-To: apmail-reef-dev-archive@minotaur.apache.org Delivered-To: apmail-reef-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B659018B6D for ; Wed, 2 Dec 2015 20:16:11 +0000 (UTC) Received: (qmail 70015 invoked by uid 500); 2 Dec 2015 20:16:11 -0000 Delivered-To: apmail-reef-dev-archive@reef.apache.org Received: (qmail 69928 invoked by uid 500); 2 Dec 2015 20:16:11 -0000 Mailing-List: contact dev-help@reef.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@reef.apache.org Delivered-To: mailing list dev@reef.apache.org Received: (qmail 69701 invoked by uid 99); 2 Dec 2015 20:16:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Dec 2015 20:16:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 5C8492C1F78 for ; Wed, 2 Dec 2015 20:16:11 +0000 (UTC) Date: Wed, 2 Dec 2015 20:16:11 +0000 (UTC) From: "Andrew Chung (JIRA)" To: dev@reef.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (REEF-580) Add a Block Management Service to REEF MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/REEF-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Chung reassigned REEF-580: --------------------------------- Assignee: Andrew Chung > Add a Block Management Service to REEF > -------------------------------------- > > Key: REEF-580 > URL: https://issues.apache.org/jira/browse/REEF-580 > Project: REEF > Issue Type: New Feature > Reporter: Markus Weimer > Assignee: Andrew Chung > Attachments: REEF Block Management Design.docx > > > We propose the addition of a data Block Management service to REEF. The Block Manager manages the transient data of a Big Data application. The Block Manager assumes that transient data can be managed in the following hierarchy: > * *Data Set:* A data set consists of a set of (physical)n partitions. For instance, a folder on HDFS could be considered a data set, while its files constitute the partitions. > * *Partition:* a physical partition of a data set. In the example above, it would be a file. Partitions consist of Blocks. > * *Block:* The atomic unit of data management. Each block belongs to exactly one partition. Blocks are immutable. Blocks can be stored in Evaluator memory, on local Disk or stable, distributed storage. Blocks can have replicas across these memory tiers. Blocks contain data of arbitrary format. From the perspective of this Block Management service, they are large, fixed sized byte arrays. > The purpose of the Block Manager is to manage the metadata and movement of data sets organized in such a way. To facilitate that, each Block, Partition and DataSet has a unique ID. > On the *Task side*, the Block Manager facilitates the retrieval of and access to any Block or Partition by their ID. Specific access methods are yet to be designed (e.g. whether or not there is an order to the blocks). Also, new Blocks can be created on the Task side for a given Partition. Special consideration shall be given to the memory allocation efficiency of this operation. > On the *Driver side*, the Block Manager keeps track of the metadata of all Blocks. It provides a network protocol used by the Task side components to retrieve and update metadata records. Metadata can be kept in memory or, in a later version, in stable storage such as a SQL database. > The Block Management service shall be built in a language and platform agnostic manner. At the very least, the Driver side network protocol needs to be accessible by both JVM and CLR implementations of the Task side. REST could be an appropriate approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332)