hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Goeke (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3453) Hive query persistence / auditing
Date Wed, 12 Sep 2012 22:35:07 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matt Goeke updated HIVE-3453:
-----------------------------

    Component/s: Thrift API
    
> Hive query persistence / auditing
> ---------------------------------
>
>                 Key: HIVE-3453
>                 URL: https://issues.apache.org/jira/browse/HIVE-3453
>             Project: Hive
>          Issue Type: Improvement
>          Components: CLI, Logging, Thrift API
>            Reporter: Matt Goeke
>            Priority: Minor
>
> Currently our Hive warehouse is open to querying from any of our business analysts and
we pool them by user in the fair scheduler to prevent someone from hogging cluster resources.
 We are looking to start summarizing details of their queries so that we can view common questions
they ask in order find ways to optimize our tables / submission process. One thought was to
patch the Hive client / thrift server to write out the submitted queries to the DB that our
metastore is on and from there we can perform some simple analytics to roll up a view of how
they use the warehouse over time. This doesn't seem like it would be too difficult of an effort
as the needed infrastructure is already in place but any suggestions or comments on this would
be greatly appreciated.
> I am leaving the implementation notes pretty blank as I would like to see what others
in the community who have more experience in this project would recommend. 
> Additional information from a user@hive.apache.org response:
> Hey Matt,
> We did something similar at Facebook to capture the information on who ran what on the
clusters and dumped that out to an audit db. Specifically we were using Hive post execution
hooks to achive that
> http://hive.apache.org/docs/r0.7.0/api/org/apache/hadoop/hive/ql/hooks/PostExecute.html
> this gets called from the hive cli mostly.
> I am not sure if the particular hook that we had implemented was contributed back, but
this could potentially be a cool contribution :)
> Ashish

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message