hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: How to capture query log and duration
Date Fri, 20 Nov 2015 02:29:41 GMT
> We would like to capture some information in our Hadoop Cluster.
> Can anybody please suggest how we can we  achieve this, any tools
>available already ? Or do we need to scrub any log ?

Apache Atlas is the standardized solution for deeper analytics into data
ownership/usage (look at the HiveHook in Atlas).

> 1. We want to know how many queries are run in everyday
> 2. What are the durations of those queries .
> 3. If any queries are failing in what step they are failing.

For a general use-case, you probably are already writing a lot of this
data already.

That only pulls the query text + plans in JSON (to automatically look for
bad plans), but the total event structure looks like this

            "domain": "DEFAULT",
            "entitytype": "HIVE_QUERY_ID",
            "events": [
                    "eventinfo": {},
                    "eventtype": "QUERY_COMPLETED",
                    "timestamp": 1447986004954
                    "eventinfo": {},
                    "eventtype": "QUERY_SUBMITTED",
                    "timestamp": 1447985970564
            "otherinfo": {
                "STATUS": true,
                "TEZ": true

                "MAPRED": false,

                "QUERY" : ...
            "primaryfilters": {
                "requestuser": [
                "user": [


I have seen at least one custom KafkaHook to feed hive query plans into a
Storm pipeline, but that was custom built to police the system after an
ad-hoc query produced a 4.5 petabyte join.


View raw message