Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D219910ADD for ; Fri, 10 Jan 2014 00:20:50 +0000 (UTC) Received: (qmail 23688 invoked by uid 500); 10 Jan 2014 00:20:50 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 23652 invoked by uid 500); 10 Jan 2014 00:20:50 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 23643 invoked by uid 99); 10 Jan 2014 00:20:50 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jan 2014 00:20:50 +0000 Date: Fri, 10 Jan 2014 00:20:50 +0000 (UTC) From: "Arpit Agarwal (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeImpl interfaces MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5751: -------------------------------- Description: The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can get eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. was: The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns blank data for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. However there are a few problems with this approach: # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. # There is additional burden of maintaining two different dataset implementations. # Fidelity between the two implementations is poor. Instead we can get eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. > Remove the FsDatasetSpi and FsVolumeImpl interfaces > --------------------------------------------------- > > Key: HDFS-5751 > URL: https://issues.apache.org/jira/browse/HDFS-5751 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, test > Affects Versions: 3.0.0 > Reporter: Arpit Agarwal > > The in-memory block map and disk interface portions of the DataNode have been abstracted out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual volumes. > The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}} which does not write any data to disk. Instead it just stores block metadata in memory and returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily large datanodes without having to provision real disk capacity. > A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}} implement {{FsDatasetSpi}}. > However there are a few problems with this approach: > # Using the factory class significantly complicates the code flow for the common case. This makes the code harder to understand and debug. > # There is additional burden of maintaining two different dataset implementations. > # Fidelity between the two implementations is poor. > Instead we can get eliminate the SPIs and just hide the disk read/write routines with a dependency injection framework like Google Guice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)