hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6801) Archival Storage: Add a new data migration tool
Date Thu, 14 Aug 2014 20:58:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097643#comment-14097643
] 

Jing Zhao commented on HDFS-6801:
---------------------------------

Some quick comments:
# Currently Mover always scan the whole namespace. Maybe we should allow users to specify
a list of paths for migration. This will also be useful in a shared cluster.
# Currently the Mover will go through the whole namespace and finishes all the check/schedule
work before starting the real migration work in dispatcher. Going through the whole namespace
may take a lot of time, thus maybe here we should start the dispatching work once there is
some work that has been scheduled? But we can do this in a separate jira as optimization.
# For a path ending with ".snapshot" (e.g., /foo/.snapshot/), {{getFileInfo}} can only return
a fake HdfsFileStatus. We may need to call {{getListing}} to get all the snapshots under the
snapshottable directory.
{code}
if (snapshottableDirs != null && snapshottableDirs.contains(dir)) {
  final String snapshotPath = dir + HdfsConstants.DOT_SNAPSHOT_DIR;
  try {
    final HdfsFileStatus snapshotFileInfo = dfs.getFileInfo(snapshotPath);
    processDirRecursively(snapshotPath, snapshotFileInfo);
{code}
# A file can be included in both the current fs directory and snapshots. Looks like the current
patch will schedule this kind of file multiple times since we process both the snapshot paths
and the normal paths? Will that cause any conflicts? We may want to only do extra processing
for files that have been deleted and only exist in snapshots.

> Archival Storage: Add a new data migration tool 
> ------------------------------------------------
>
>                 Key: HDFS-6801
>                 URL: https://issues.apache.org/jira/browse/HDFS-6801
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer, namenode
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h6801_20140813.patch, h6801_20140814.patch, h6801_20140814b.patch
>
>
> The tool is similar to Balancer.  It periodic scans the blocks in HDFS and uses path
and/or other meta data (e.g. mtime) to determine if a block should be cooled down (i.e. hot
=> warm, or warm => cold) or warmed up (i.e. cold => warm, or warm => hot).  In
contrast to Balancer, the migration tool always move replicas to a different storage type.
 Similar to Balancer, the replicas are moved in a way that the number of racks the block does
not decrease.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message