hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Keller (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-7611) SequenceFile.Sorter creates local temp files on HDFS
Date Mon, 05 Sep 2011 17:46:09 GMT
SequenceFile.Sorter creates local temp files on HDFS

                 Key: HADOOP-7611
                 URL: https://issues.apache.org/jira/browse/HADOOP-7611
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 0.20.2
         Environment: CentOS 5.6 64-bit, Oracle JDK 1.6.0_26 64-bit
            Reporter: Bryan Keller

When using SequenceFile.Sorter to sort or merge sequence files that exist in HDFS, it attempts
to create temp files in a directory structure specified by mapred.local.dir but on HDFS, not
in the local file system. The problem code is in MergeQueue.merge(). Starting at line 2953:
            Path outputFile =  lDirAlloc.getLocalPathForWrite(
                                                approxOutputSize, conf);
            LOG.debug("writing intermediate results to " + outputFile);
            Writer writer = cloneFileAttributes(

                                                fs.makeQualified(outputFile), null);
The outputFile here is a local path without a scheme, e.g. "/mnt/mnt1/mapred/local", specified
by the mapred.local.dir property. If we are sorting files on HDFS, the fs object is a DistributedFileSystem.
The call to fs.makeQualified(outputFile) appends the fs object's scheme to the local temp
path returned by lDirAlloc, e.g. hdfs:///mnt/mnt1/mapred/local. This directory is then created
(if the proper permissions are available) on HDFS. If the HDFS permissions are not available,
the sort/merge fails even though the directories exist locally.

The code should instead always use the local file system if retrieving a path from the mapred.local.dir
property. The unit tests do not test this condition, they only test using the local file system
for sort and merge.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message