hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mollitor (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-22402) Deprecate Hive PerfLogger
Date Fri, 25 Oct 2019 18:56:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-22402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Mollitor updated HIVE-22402:
----------------------------------
    Attachment: HIVE-22402.2.patch

> Deprecate Hive PerfLogger
> -------------------------
>
>                 Key: HIVE-22402
>                 URL: https://issues.apache.org/jira/browse/HIVE-22402
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>         Attachments: HIVE-22402.1.patch, HIVE-22402.2.patch
>
>
> Recently I wanted to add some additional capability, and add more, performance logging
to support my troubleshooting efforts. I started looking at PerfLogger and started to examine
its usage. I discovered a few things:
>  # Since 'loggers' must be open and closed manually, I found a couple of places where
loggers were opened, but not closed, rendering them useless
>  # Since 'loggers' must be closed manually, I found a few places where an early-return
or Exception thrown would cause a logger to not be closed, thereby rendering it useless
>  # Session information is not logged, so it can be difficult to precisely pinpoint which
session is taking lots of time
>  # PerfLogger overloaded. Most of the time, it's being used as a simple timer mechanism
with automatic logging in SLF4J debug. However, it is also a facade over the Hive Metrics
subsystem and timing results are automatically published to Metrics and then there becomes
this dependency on a 'logger' to be able to access metric data as well.
> The last bullet is the most challenging part and why I propose to deprecate the Hive
{{PerfLogger}} and not simply remove it. I am proposing a new system... a {{PerfTimer}} that
is allows for Java 8's try-with-resources feature to protect against the developer having
to care about manually close measurements and not having to carefully consider all early-exits.
The base implementation logs to SLF4J. An extended version automatically publishes to the
Hive Metric subsystem as well.
> The Hive {{PerfLogger}} has a bit of a clunky system for allowing plugable implementations.
However, there are sections of the code that rely on the {{PerfLogger}}'s ability to publish
to the Metrics subsystem. These code section look up various timers in the Metrics Subsytem
and publish the results back to the client. Since, in theory, the implementation is plugable,
any other implementation that does not publish to the Metrics Subsystem will break these code
paths. Also, these code paths create and interact with {{PerfLoggers in a static way, and
then the publishing code pulls the data from the {{PerfLogger}}}} (as a facade to the Metrics
subsystem) in a static way. Therefore, when I tried to replace the entire {{PerfLogger}} code,
I came across an issue because there is not (and should not) be a way to just statically pull
this information down from any point in the code. Information that is required for publishing
should be passed around within some sort of context object, separate from the Metrics subsystem.
There was no obvious way to string a new {{PerfTimer}} to all the required locations. I propose
marking the {{PerfLogger}} as deprecated and leaving these complex section alone. Instead,
replace only the simple "I want a timer" use cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message