Shane Kumpf created MAPREDUCE-5740:
--------------------------------------
Summary: Shuffle error when the MiniMRYARNCluster work path contains special
characters
Key: MAPREDUCE-5740
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5740
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Shane Kumpf
Priority: Minor
When running tests that leverage MiniMRYARNCluster a failure occurs during the jenkins build,
however, the tests are successful on local workstations.
The exception found is as follows:
{quote}
2014-01-30 10:59:28,649 ERROR [ShuffleHandler.java:510] Shuffle error :
java.io.IOException: Error Reading IndexFile
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:123)
at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:68)
at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:592)
at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:503)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: /home/sitebuild/jenkins/workspace/%7Binventory-engineering%7D-snapshot-workflow-%7BS7274%7D/target/Integration-Tests/Integration-Tests-localDir-nm-0_2/usercache/sitebuild/appcache/application_1391108343099_0001/output/attempt_1391108343099_0001_m_000000_0/file.out.index
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:210)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
... 32 more
{quote}
It was found that org.apache.hadoop.mapred.SpillRecord does a toURI on the indexFileName Path
object (line 71). Jenkins uses {} to denote team and branch. These {} characters are being
URL encoded, which causes the FileNotFoundException during the shuffle phase.
Interestingly, the code snippet is as follows and seems a little strange to be doing the Path.toUri()
so high up in the call:
{code}
public SpillRecord(Path indexFileName, JobConf job, Checksum crc, String expectedIndexOwner)
throws IOException {
final FileSystem rfs = FileSystem.getLocal(job).getRaw();
final FSDataInputStream in =
SecureIOUtils.openFSDataInputStream(new File(indexFileName.toUri().getRawPath()),
expectedIndexOwner, null);
....
}
{code}
and SecureIOUtils creates a Path from the File object (!):
{code}
public static FSDataInputStream openFSDataInputStream(File file,
String expectedOwner, String expectedGroup) throws IOException {
if (!UserGroupInformation.isSecurityEnabled()) {
return rawFilesystem.open('''new Path(file.getAbsolutePath())''');
}
return forceSecureOpenFSDataInputStream(file, expectedOwner, expectedGroup);
}
{code}
The rawFileSystem.open(Path) code, above, is executed by the abstract class FileSystem that
delegates to the child class at runtime, which could be any of:
• ChRootedFileSystem
• ChecksumFileSystem
• DistributedFileSystem
• FtpFileSystem
• WebHdfsFileSystem
• and others
URL escaping makes sense for the WebHdfsFileSystem and some others, but not for all. It seems
to make sense to only URL escape within FileSystem implementations that require it.
Also of note: MiniMRYarnCluster allows for changing a bulk of the directories it uses via
org.apache.hadoop.yarn.conf.YarnConfiguration, however testWorkDir is not one of them. testWorkDir
is hardcoded to use the following in org.apache.hadoop.yarn.server.MiniYARNCluster.java
{code}
public MiniYARNCluster(String testName, int noOfNodeManagers,
int numLocalDirs, int numLogDirs) {
super(testName.replace("$", ""));
this.numLocalDirs = numLocalDirs;
this.numLogDirs = numLogDirs;
this.testWorkDir = new File("target",
testName.replace("$", ""));
....
}
{code}
If modifications to SpillRecord are undesirable, allowing testWorkDir to be configurable might
be a good workaround.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
|