hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <>
Subject [jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]
Date Tue, 25 Nov 2014 21:57:13 GMT


Marcelo Vanzin commented on HIVE-8574:

Hey [~chengxiang li], I'd like to have a better understanding of how these metrics will be
used by Hive to come up with the proper fix here.

I see two approaches:

* Add an API to clean up the metrics. This keeps the current "collect all metrics" approach,
but adds APIs that will to delete the metrics. This assumes that Hive will always process
metrics of finished jobs, even if just to ask them to be deleted.

* Suggested by [~xuefuz]: add a timeout after a job is finished for cleaning up the metrics.
This means that Hive has some time after a job finished where this data will be available,
but after that, it's gone.

I could also add some internal checks so that the collection doesn't keep acumulating data
indefinitely if data is never deleted; like track only the last "x" finished jobs, evicting
the oldest when a new job starts.

What do you think?

> Enhance metrics gathering in Spark Client [Spark Branch]
> --------------------------------------------------------
>                 Key: HIVE-8574
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
> The current implementation of metrics gathering in the Spark client is a little hacky.
First, it's awkward to use (and the implementation is also pretty ugly). Second, it will just
collect metrics indefinitely, so in the long term it turns into a huge memory leak.
> We need a simplified interface and some mechanism for disposing of old metrics.

This message was sent by Atlassian JIRA

View raw message