hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
Date Mon, 08 Dec 2008 06:06:44 GMT

    [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654321#action_12654321
] 

Joydeep Sen Sarma commented on HIVE-131:
----------------------------------------

i quickly glanced at the doc mentioned. seems like hadoop will move the files automatically
to mapred.output.dir - but we don't have a single output directory for the task (since we
can have multiple outputs).

anyway - i am not sure why the problem happens (it almost seems like map-reduce can declare
a job complete while (speculative) tasks are still running) - but a trivial fix is to just
create the tmp files in a completely different directory (say scratch dir per query) and then
move from there. we can discard the scratch dir entirely on query completion. there's still
some risk of these runaway files leaking inodes/space. but if these are known hive scratchdir
locations - they can always be cleaned up later on.

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r--
jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r--
jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma
supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma
supergroup
> this happened with speculative execution. the code looks good (in fact in this case many
speculative tasks were launched - and only a couple caused problems). Almost seems like these
files did not appear in the namespace until after the map-reduce job finished and the movetask
did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message