hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shannon Ladymon (JIRA)" <>
Subject [jira] [Commented] (HIVE-13429) Tool to remove dangling scratch dir
Date Mon, 25 Apr 2016 23:51:12 GMT


Shannon Ladymon commented on HIVE-13429:

Thanks for the edits, [~daijy]. I added the information about multi-user environments not
being an option for *hive.start.cleanup.scratchdir* to the wiki as well.

> Tool to remove dangling scratch dir
> -----------------------------------
>                 Key: HIVE-13429
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 1.3.0, 2.1.0
>         Attachments: HIVE-13429.1.patch, HIVE-13429.2.patch, HIVE-13429.3.patch, HIVE-13429.4.patch,
HIVE-13429.5.patch, HIVE-13429.branch-1.patch
> We have seen in some cases, user will leave the scratch dir behind, and eventually eat
out hdfs storage. This could happen when vm restarts and leave no chance for Hive to run shutdown
hook. This is applicable for both HiveCli and HiveServer2. Here we provide an external tool
to clear dead scratch dir as needed.
> We need a way to identify which scratch dir is in use. We will rely on HDFS write lock
for that. Here is how HDFS write lock works:
> 1. A HDFS client open HDFS file for write and only close at the time of shutdown
> 2. Cleanup process can try to open HDFS file for write. If the client holding this file
is still running, we will get exception. Otherwise, we know the client is dead
> 3. If the HDFS client dies without closing the HDFS file, NN will reclaim the lease after
10 min, ie, the HDFS file hold by the dead client is writable again after 10 min
> So here is how we remove dangling scratch directory in Hive:
> 1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and only close
it when we about to drop scratch directory
> 2. A command line tool cleardanglingscratchdir  will check every scratch directory and
try open the lock file for write. If it does not get exception, meaning the owner is dead
and we can safely remove the scratch directory
> 3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but we still
cannot reclaim the scratch directory for another 10 min. But this should be tolerable

This message was sent by Atlassian JIRA

View raw message