hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "feiwei (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2139) Fast copy for HDFS.
Date Mon, 30 Oct 2017 16:35:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225268#comment-16225268
] 

feiwei commented on HDFS-2139:
------------------------------

In FsDatasetImpl.java  , You should modify to ensure synchronization 

public void hardLinkOneBlock(ExtendedBlock srcBlock, ExtendedBlock dstBlock) throws IOException
{
    BlockLocalPathInfo blpi = getBlockLocalPathInfo(srcBlock);
    File src = new File(blpi.getBlockPath());
    File srcMeta = new File(blpi.getMetaPath());
    
    if (getVolume(srcBlock).getAvailable() < dstBlock.getNumBytes()) {
      throw new DiskOutOfSpaceException("Insufficient space for hardlink block " + srcBlock);
    }
    
    BlockPoolSlice dstBPS = getVolume(srcBlock).getBlockPoolSlice(dstBlock.getBlockPoolId());
            
    synchronized (this) {
               File dstBlockFile = dstBPS.hardLinkOneBlock(src, srcMeta, dstBlock.getLocalBlock());
    	       dstBlockFile = dstBPS.addBlock(dstBlock.getLocalBlock(), dstBlockFile);
    	       ReplicaInfo replicaInfo = new FinalizedReplica(dstBlock.getLocalBlock(), getVolume(srcBlock),
dstBlockFile.getParentFile());
    	       volumeMap.add(dstBlock.getBlockPoolId(), replicaInfo);
    }
  }

> Fast copy for HDFS.
> -------------------
>
>                 Key: HDFS-2139
>                 URL: https://issues.apache.org/jira/browse/HDFS-2139
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Pritam Damania
>            Assignee: Rituraj
>         Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism for a file
works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message