pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4916) Pig on Tez fail to remove temporary HDFS files in some cases
Date Wed, 01 Jun 2016 23:43:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311391#comment-15311391
] 

Chris Nauroth commented on PIG-4916:
------------------------------------

Hello [~daijy].  Thank you for the patch.  +1 (non-binding) from me.  I agree that it isn't
feasible to write a unit test for this.  We have confirmation from your manual testing that
it worked though.

> Pig on Tez fail to remove temporary HDFS files in some cases
> ------------------------------------------------------------
>
>                 Key: PIG-4916
>                 URL: https://issues.apache.org/jira/browse/PIG-4916
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.17.0, 0.16.1
>
>         Attachments: PIG-4916-1.patch
>
>
> We saw the following stack trace when running Pig on S3:
> {code}
> 2016-06-01 22:04:22,714 [Thread-19] INFO  org.apache.hadoop.service.AbstractService -
Service org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl failed in state STOPPED;
cause: java.io.IOException: Filesystem closed
> java.io.IOException: Filesystem closed
> 	at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
> 	at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
> 	at org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
> 	at org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
> 	at org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
> 	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
> 	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> 	at org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
> 	at org.apache.tez.client.TezClient.stop(TezClient.java:582)
> 	at org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
> 	at org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
> 2016-06-01 22:04:22,718 [Thread-19] ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager
- Error shutting down Tez session org.apache.tez.client.TezClient@48bf833a
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: Filesystem closed
> 	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> 	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:225)
> 	at org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
> 	at org.apache.tez.client.TezClient.stop(TezClient.java:582)
> 	at org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
> 	at org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
> Caused by: java.io.IOException: Filesystem closed
> 	at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
> 	at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
> 	at org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
> 	at org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
> 	at org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
> 	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
> 	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> 	... 4 more
> {code}
> The job run successfully, but the temporary hdfs files are not removed.
> [~cnauroth] points out FileSystem also use shutdown hook to close FileSystem instances
and it might run before Pig's shutdown hook in Main. By switching to Hadoop's ShutdownHookManager,
we can put an order on shutdown hook.
> This has been verified by testing the following code in Main:
> {code}
>         ShutdownHookManager.get().addShutdownHook(new Runnable() {
>             @Override
>             public void run() {
>                 FileLocalizer.deleteTempResourceFiles();
>             }
>         }, priority);
> {code}
> Notice FileSystem.SHUTDOWN_HOOK_PRIORITY=10. When priority=9, Pig fail. When priority=11,
Pig success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message