hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2453) Need a way to categorize queries in hooks for improved logging
Date Mon, 19 Sep 2011 17:11:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107989#comment-13107989
] 

jiraposter@reviews.apache.org commented on HIVE-2453:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1933/
-----------------------------------------------------------

(Updated 2011-09-19 17:09:57.838587)


Review request for hive and Ning Zhang.


Changes
-------

QueryProperties now captures "distribute by" as Ning requested, and "cluster by" as it seemed
like a logical addition.

I added test cases for these as well.


Summary
-------

The information that would be useful for categorizing queries is clearest in the Semantic
Analyzer, when the data from the Parser is interpreted.  I added a new class which is designed
to collect that data here, and place it ultimately in the QueryPlan where it will be available
to hooks.

The information I collect is whether or not the query has the following clauses:
  Join
  Group By
  Order By
  Sort By
  Group By after a Join clause

Also, I store whether or not a script is used for mapping or reducing.


This addresses bug HIVE-2453.
    https://issues.apache.org/jira/browse/HIVE-2453


Diffs (updated)
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION

  trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1933/diff


Testing
-------

I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties
is set by at least one query, and also some are set in combinations.
I also added a hook which prints the contents of QueryProperties to error on the console.

I checked the output in the results file and verified it matched what I expected.


Thanks,

Kevin



> Need a way to categorize queries in hooks for improved logging
> --------------------------------------------------------------
>
>                 Key: HIVE-2453
>                 URL: https://issues.apache.org/jira/browse/HIVE-2453
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2453.1.patch.txt, HIVE-2453.2.patch.txt
>
>
> We need a way to categorize queries, such as whether or not the include a join clause,
a group by clause, etc., in the hooks.  This will allow for better performance logging.
> Currently the only way I can find is to go through the operators in the tasks, but which
operators are used for the different types of queries may change over time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message