hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang" <nzh...@fb.com>
Subject Re: Review Request: Need a way to categorize queries in hooks for improved logging
Date Sat, 17 Sep 2011 04:41:11 GMT


> On 2011-09-16 21:27:59, Ning Zhang wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java, line 42
> > <https://reviews.apache.org/r/1933/diff/1/?file=41497#file41497line42>
> >
> >     can you split it into 2 parts: useScriptInMapper and useScriptInReducer?
> 
> Kevin Wilfong wrote:
>     Determining whether a script is used in the mapper or the reducer will require going
through the operator tree added to each Map Reduce job to determine if a Transform operator
is there and then setting the appropriate flag.  That is more work than I'd like to do here
considering this feature will probably not be used by most users.  I would like to keep the
flag here, so that it can be decided if that work needs to be performed somewhere else.

OK. My original thought of splitting this into mapper and reducer flags is that we can analyze
the cost of the script operator based on its input size (mappers and reducers have different
input size metrics). Let's see if they are needed in the future and file a followup JIRA then.



- Ning


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1933/#review1946
-----------------------------------------------------------


On 2011-09-17 00:14:50, Kevin Wilfong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1933/
> -----------------------------------------------------------
> 
> (Updated 2011-09-17 00:14:50)
> 
> 
> Review request for hive and Ning Zhang.
> 
> 
> Summary
> -------
> 
> The information that would be useful for categorizing queries is clearest in the Semantic
Analyzer, when the data from the Parser is interpreted.  I added a new class which is designed
to collect that data here, and place it ultimately in the QueryPlan where it will be available
to hooks.
> 
> The information I collect is whether or not the query has the following clauses:
>   Join
>   Group By
>   Order By
>   Sort By
>   Group By after a Join clause
> 
> Also, I store whether or not a script is used for mapping or reducing.
> 
> 
> This addresses bug HIVE-2453.
>     https://issues.apache.org/jira/browse/HIVE-2453
> 
> 
> Diffs
> -----
> 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719

>   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719 
>   trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION

>   trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/1933/diff
> 
> 
> Testing
> -------
> 
> I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties
is set by at least one query, and also some are set in combinations.
> I also added a hook which prints the contents of QueryProperties to error on the console.
> 
> I checked the output in the results file and verified it matched what I expected.
> 
> 
> Thanks,
> 
> Kevin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message