hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Omkar Vinit Joshi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception
Date Thu, 03 Oct 2013 00:50:45 GMT

    [ https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784708#comment-13784708

Omkar Vinit Joshi commented on YARN-1219:

bq. I didn't see anywhere in code to treat the ".tmp" file differently. If you know please
let me know. If the original author only used a suffix to make sure the name is different
than the original file name, it doesn't seem to be worth it to add an unnecessary and error-prone
rename operations just to keep the temporary file name suffix.
No we are not adding new just moving them around. from unpack to here..Ideally that rename
code should have been present here only. I remember we had a bug to remove that .tmp file.
But I think it is fine we can go ahead with this patch. As it will not break anything else.

> FSDownload changes file suffix making FileUtil.unTar() throw exception
> ----------------------------------------------------------------------
>                 Key: YARN-1219
>                 URL: https://issues.apache.org/jira/browse/YARN-1219
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta
>            Reporter: shanyu zhao
>            Assignee: shanyu zhao
>             Fix For: 2.1.2-beta
>         Attachments: YARN-1219.patch
> While running a Hive join operation on Yarn, I saw exception as described below. This
is caused by FSDownload copy the files into a temp file and change the suffix into ".tmp"
before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file
is "gzipped" by looking at the file suffix:
> {code}
> boolean gzipped = inFile.toString().endsWith("gz");
> {code}
> To fix this problem, we can remove the ".tmp" in the temp file name.
> Here is the detailed exception:
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240)
> 	at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676)
> 	at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625)
> 	at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)

This message was sent by Atlassian JIRA

View raw message