hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Keller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7611) SequenceFile.Sorter creates local temp files on HDFS
Date Tue, 06 Sep 2011 15:08:10 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098070#comment-13098070
] 

Bryan Keller commented on HADOOP-7611:
--------------------------------------

I believe the intent was to have the intermediate files on the same file system as the result.
Simply removing the prepending of the mapped.local.dir path works, as the intermediate file
names are already being generated to avoid conflicts. I have modified SequenceFile to this
effect and the sort/merge works properly without creating unintended directories on HDFS.
Here is one of the changes (there are a couple other minor changes to support this).
{code}
//            Path outputFile =  lDirAlloc.getLocalPathForWrite(
//                                                tmpFilename.toString(),
//                                                approxOutputSize, conf);
//            LOG.debug("writing intermediate results to " + outputFile);
//
//            Writer writer = cloneFileAttributes(
//                    fs.makeQualified(segmentsToMerge.get(0).segmentPathName),
//                    fs.makeQualified(outputFile), null);
            LOG.debug("writing intermediate results to " + tmpFilename);

            Writer writer = cloneFileAttributes(
                    fs.makeQualified(segmentsToMerge.get(0).segmentPathName),
                    fs.makeQualified(tmpFilename), null);
{code}

> SequenceFile.Sorter creates local temp files on HDFS
> ----------------------------------------------------
>
>                 Key: HADOOP-7611
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7611
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.2
>         Environment: CentOS 5.6 64-bit, Oracle JDK 1.6.0_26 64-bit
>            Reporter: Bryan Keller
>
> When using SequenceFile.Sorter to sort or merge sequence files that exist in HDFS, it
attempts to create temp files in a directory structure specified by mapred.local.dir but on
HDFS, not in the local file system. The problem code is in MergeQueue.merge(). Starting at
line 2953:
> {code}
>             Path outputFile =  lDirAlloc.getLocalPathForWrite(
>                                                 tmpFilename.toString(),
>                                                 approxOutputSize, conf);
>             LOG.debug("writing intermediate results to " + outputFile);
>             Writer writer = cloneFileAttributes(
>                                                 fs.makeQualified(segmentsToMerge.get(0).segmentPathName),

>                                                 fs.makeQualified(outputFile), null);
> {code}
> The outputFile here is a local path without a scheme, e.g. "/mnt/mnt1/mapred/local",
specified by the mapred.local.dir property. If we are sorting files on HDFS, the fs object
is a DistributedFileSystem. The call to fs.makeQualified(outputFile) appends the fs object's
scheme to the local temp path returned by lDirAlloc, e.g. hdfs://mnt/mnt1/mapred/local. This
directory is then created (if the proper permissions are available) on HDFS. If the HDFS permissions
are not available, the sort/merge fails even though the directories exist locally.
> The code should instead always use the local file system if retrieving a path from the
mapred.local.dir property. The unit tests do not test this condition, they only test using
the local file system for sort and merge.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message