hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " (JIRA)" <>
Subject [jira] [Commented] (HIVE-2453) Need a way to categorize queries in hooks for improved logging
Date Mon, 19 Sep 2011 17:11:11 GMT

] commented on HIVE-2453:

This is an automatically generated e-mail. To reply, visit:

(Updated 2011-09-19 17:09:57.838587)

Review request for hive and Ning Zhang.


QueryProperties now captures "distribute by" as Ning requested, and "cluster by" as it seemed
like a logical addition.

I added test cases for these as well.


The information that would be useful for categorizing queries is clearest in the Semantic
Analyzer, when the data from the Parser is interpreted.  I added a new class which is designed
to collect that data here, and place it ultimately in the QueryPlan where it will be available
to hooks.

The information I collect is whether or not the query has the following clauses:
  Group By
  Order By
  Sort By
  Group By after a Join clause

Also, I store whether or not a script is used for mapping or reducing.

This addresses bug HIVE-2453.

Diffs (updated)

  trunk/ql/src/java/org/apache/hadoop/hive/ql/ 1170719 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/ PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ 1170719 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ 1170719 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/ PRE-CREATION

  trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION 



I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties
is set by at least one query, and also some are set in combinations.
I also added a hook which prints the contents of QueryProperties to error on the console.

I checked the output in the results file and verified it matched what I expected.



> Need a way to categorize queries in hooks for improved logging
> --------------------------------------------------------------
>                 Key: HIVE-2453
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2453.1.patch.txt, HIVE-2453.2.patch.txt
> We need a way to categorize queries, such as whether or not the include a join clause,
a group by clause, etc., in the hooks.  This will allow for better performance logging.
> Currently the only way I can find is to go through the operators in the tasks, but which
operators are used for the different types of queries may change over time.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message