hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
Date Thu, 12 Feb 2009 05:32:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joydeep Sen Sarma updated HIVE-131:
-----------------------------------

    Attachment: hive-131.patch.2

Dhruba said:

> 1. I see that execute returns values 1, 2, and 3. It will be good to document what these
values mean.
> 2. Staring hadoop 0.19, it might make sense to set FileSystem.deleteOnExit() for files
that are temporary.
> 3. It is interesting to note that now there is an extra step jobClose() that gets triggered
on the client-side after the job is complete. Prior to this patch, a job would be successful
even if the client-side has disappeared before the job is completed. This patch requires that
the client remains active and healthy till the entire job is complete. This probably is ok
for Hive, especially because Hive anyway requires job-chaining and I do not see any other
way to do it

- incorporated  suggestion to use deleteOnExit where available.
- return codes are always accompanied by a corresponding message on the console/log. So don't
see much point creating additional documentation around them.
- hive has always depended on client side code-patch for query completion.

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>         Attachments: HIVE-131.patch.1, hive-131.patch.2
>
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r--
jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r--
jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma
supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma
supergroup
> this happened with speculative execution. the code looks good (in fact in this case many
speculative tasks were launched - and only a couple caused problems). Almost seems like these
files did not appear in the namespace until after the map-reduce job finished and the movetask
did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message