crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-252) The driver program may copy huge of data if the output path is not on the same fs of working dir
Date Fri, 16 Aug 2013 05:34:47 GMT
Chao Shi created CRUNCH-252:
-------------------------------

             Summary: The driver program may copy huge of data if the output path is not on
the same fs of working dir
                 Key: CRUNCH-252
                 URL: https://issues.apache.org/jira/browse/CRUNCH-252
             Project: Crunch
          Issue Type: Bug
            Reporter: Chao Shi


I encounter this problem when I run a pipeline of MRs on cluster A and want the final outcome
to be stored on cluster B. I don't want to simply point the working dir to B, because we want
all the intermediate output stored on A.

Here is the stacktrace of the driver program, which is copying output.

"Thread-15" prio=10 tid=0x00007faa90130800 nid=0x3e73 runnable [0x00007faa874ed000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:240)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
        - locked <0x00000000c45d8128> (a org.apache.hadoop.hdfs.DFSClient$BlockReader)
        at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1249)
        - locked <0x00000000c45d8128> (a org.apache.hadoop.hdfs.DFSClient$BlockReader)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1899)
        - locked <0x00000000c2ccb608> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1951)
        - locked <0x00000000c2ccb608> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
        at java.io.DataInputStream.read(DataInputStream.java:83)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:89)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:224)
        at org.apache.crunch.io.impl.FileTargetImpl.handleOutputs(FileTargetImpl.java:109)
        at org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:87)
        - locked <0x00000000c18e6fb8> (a org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook)
        at org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.run(CrunchJobHooks.java:79)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkRunningState(CrunchControlledJob.java:251)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkState(CrunchControlledJob.java:261)
        - locked <0x00000000c18e6ff8> (a org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.checkRunningJobs(CrunchJobControl.java:170)
        - locked <0x00000000c18e7028> (a org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:221)
        at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:101)
        at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:52)
        at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:76)
        at java.lang.Thread.run(Thread.java:662)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message