hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <>
Subject [jira] [Commented] (HIVE-9017) Clean up temp files of RSC [Spark Branch]
Date Sat, 13 Dec 2014 00:34:13 GMT


Marcelo Vanzin commented on HIVE-9017:

In Spark-speak, "executor" is the JVM that executes tasks. There's no name by which the individual
threads an executor has are referred to, I guess you could say "task runner", but well, it's
rare to see someone even talk about those.

About "can you run more than one executor per host", the answer is yes, but it's a little
more complicated than that.

In Yarn mode, it's definitely possible, but then Yarn doesn't suffer from this issue.

In standalone mode, it's unusual. You can achieve that in two ways:

- run with a "local-cluster" master, which HoS uses for testing. But people shouldn't use
that in production.
- run multiple "Worker" daemons on the same host; I don't know if that's possible, but right
now Spark standalone has a 1:1 relationship between Worker daemons and executors.

But, long story short, you can't delete these files when the executor goes down. That could
break Yarn mode, and even in standalone mode that is kinda sketchy (let's say the executor
dies and is restarted, having these files around could avoid having to re-download a large
jar from the driver node).

> Clean up temp files of RSC [Spark Branch]
> -----------------------------------------
>                 Key: HIVE-9017
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
> Currently RSC will leave a lot of temp files in {{/tmp}}, including {{*_lock}}, {{*_cache}},
{{spark-submit.*.properties}}, etc.
> We should clean up these files or it will exhaust disk space.

This message was sent by Atlassian JIRA

View raw message