Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 12 Sep 2014 23:09:34 +0000 (UTC)
From: "Jing Zhao (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12722899.1403304022000.24026.1410563374465@Atlassian.JIRA>
In-Reply-To: <JIRA.12722899.1403304022000@Atlassian.JIRA>
References: <JIRA.12722899.1403304022000@Atlassian.JIRA>
 <JIRA.12722899.1403304022278@arcas>
Subject: [jira] [Commented] (HDFS-6584) Support Archival Storage
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132273#comment-14132273 ] 

Jing Zhao commented on HDFS-6584:
---------------------------------

bq. How does this interact with open files?
bq. Actually we should ignore the incomplete block which can be inferred from LocatedBlocks.

Checked the code. Looks like the current writing pipeline handles this scenario correctly (without causing data corruption). The namenode will not delete any replica when it finds that the block is actually still under construction. But it will still be more efficient to avoid migrating under-construction blocks.

> Support Archival Storage
> ------------------------
>
>                 Key: HDFS-6584
>                 URL: https://issues.apache.org/jira/browse/HDFS-6584
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: balancer, namenode
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: HDFS-6584.000.patch, HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf, archival-storage-testplan.pdf, h6584_20140907.patch, h6584_20140908.patch, h6584_20140908b.patch, h6584_20140911.patch, h6584_20140911b.patch
>
>
> In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is:
> - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually.
> - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster.
> Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)