hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-4126) remove support for lead/lag UDFs outside of UDAF args
Date Fri, 08 Mar 2013 00:56:13 GMT

     [ https://issues.apache.org/jira/browse/HIVE-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Phabricator updated HIVE-4126:
------------------------------

    Attachment: HIVE-4126.D9105.2.patch

hbutani updated the revision "HIVE-4126 [jira] remove support for lead/lag UDFs outside of
UDAF args".

    - add PTF clause to grammar
    - PTF Test Queries
    - Query Data
    - corrected data file name
    - Merge branch 'hive-896' of github.com:hbutani/hive into hive-896
    - windowing + hive attempt
    - Hooking QueryDef to QB
    - ptf source is a subQuery, not a select statement
    - add windowing clauses in grammar
    - fix grammar exception issue
    - associate PTF nodes with corresponding Insert node, if any
    - minor grammar fixes
    - flush out processing PTF tree in phase1
    - associate PTFs in dest node handling in Phase1
    - Classes not needed
    - Merge branch 'crook' of github.com:hbutani/hive into crook
    - Merge with apache hive
    - handle SortBy, Having and Window clauses in Phase1
    - remove ambiguities in Hive.g
    - Merge branch 'crook' of https://github.com/hbutani/hive into crook
    - syntactically allow a window specification in a selectItem
    - tweak QuerySpec building.
    - check that there is no GBy and where when deciding if a Windowing
    - Merge branch 'crook' of https://github.com/hbutani/hive into crook
    - Add several checks:
    - QSpec to QDef (Checked qdef serialization and deserialization: it works)
    - refactor the PTF ifc.
    - refactor: rename annotation classes to end in Description
    - refactor: move annotation classes to ql.exec package, where the
    - refactor: move window functions to ql.udf.generic package
    - refcator: move GenericUDFLeadLag to ql.udf.generic
    - refactor: move TablFunc bases classes to ql.udf.ptf
    - refactor: move PTblFuncs to ql.udf.ptf
    - refactor: move Order enum to ptf.query.spec package, so that i can
    - refactor: remove classes in ptf.metadata package. Not needed now
    - refactor: FunctionRegistry, extract FunctionInfo classes; step 1 in
    - fix logic that checks for windowing specifications in Select List
    - reenable ensurePTFChainHasPartitioning;
    - Added aliasToAST map: To setup expressions map in PTF's output
    - Added AST expression: Used to populate PTF RowResolver's expression map
    - Using input operator's RowResolver to construct OI for HiveTableDef
    - Use of SCRIPT DependencyType for processing rule on PTFOperator.
    - Cleanup: Changed prototype of translate method in Translator. Commented
    - Add utility methods:
    - Add method to get operator name in PTFOperator.
    - 1. Translation of QuerySpec to QueryDef
    - Merge branch 'hive-896' into crook
    - Following changes:
    - refactor genPTFPlan:
    - minor bug: clear Agg. and Distinct Agg. lists in QB ParseInfo.
    - flush out SelectDef translation:
    - introduce initializeOutputOI and initializeRawInputOI to TableFunction
    - When constructing the RowResolver for the Windowing or Noop PTFs:
    - add the columns from the last PTF, before adding any
    - 1. Change logic of how/which TableFunc is added to a QuerySpec: if query
    - During QDef deserialization use the passed in inputOI as the OI of the
    - for subQueries as input to PTF, construct a HiveTableSpec.
    - Tests successful for queries with: windowing, lead/lag, noop, gby, having, join with
lead/lag, join with noop
    - support an alias for a PTF invocation. This is needed so that a PTF
    - add test for alias in ptf invocation
    - translate ptf invocations in the from clause (that are not associated
    - add tests that exercise generation of separate PTFOps for PTFChain and
    - Fix aggregations bug: move aggregation expressions from aggregationTrees to PTF QuerySpec
if no group by clause is seen at the end of phase 1
    - add support for PTF invocation in joins.
    - adding tests for ptf invocation in joins
    - handle mixed case aliases.
    - mixed case alias test
    - Create PTF Map-side RR:
    - Merge branch 'hive-896' into crook
    - fix Hive.g merge issue: duplicate KW_ROWS definition
    - Having: Tests to support having with windowing and ptf in queries with no group by.
    - - during handleClusterOrDistributeByForWindowing invoke the
    - Merge branch 'crook' of https://github.com/hbutani/hive into crook
    - add tests to check
    - when extracting Windowing clauses from selectList handle the case
    - when extracting Windowing clauses from selectList handle the case
    - More tests with UDAFs, statistical and distribution functions
    - No need to specify Writable option to copy object.
    - Following changes:
    - Tests:
    - disallow Count/Sum distinct with windowing
    - refactor ptf.translate package:
    - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
    - refactor ptf.query.specification package
    - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
    - refactor ptf.query.definition package
    - Get rid of ql.ptf.functions package
    - move PTFSpec to hive.ql.parse package
    - move PTFDef to hive.ql.plan package
    - rename QueryInput Def & Spec class names to better reflect their
    - refactor ptf.runtime package
    - Setup PTFSpec for QB at the end of phase I for cases where it is not already done.
    - Rollback change for os.family
    - Remove individual test files: all test queries are in
    - get rid ptf.io package
    - minor cleanup in ptf.ds package
    - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
    - remove ptf.utils.HiveUtils
    - cleanup of ptf.query.translate package
    - merge with hive
    - cleanup and document data struct additions on SemAly, QB, QBParseInfo.
    - Changed error message for negative test: ptf_negative_NoSortNoDistByClause
    - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
    - Remove NPath code
    - Merge with PTF HEAD
    - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
    - Allow use of constant expressions in select clause for queries with No GBY, No PTF and
No windowing
    - cleanup: get rid of WindowingTypeCheckFactory; use hive's
    - check for errors from TypeCheckFactory when building
    - allow Windowing invocations w/o aliases
    - cleanup: move remaining ParseUtils function to PTFTranslator
    - cleanup:
    - cleanup: move PTFPartition to ql.exec package
    - carry forward the expression mappings in the RRs constructed for PTFs
    - use column position to generate internal names for PTF Op's RR.
    - Resolve merge conflicts
    - Normalize line endings
    - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
    - recover changes to PTFOp lost because of merge and CRLF issues
    - support different UDAF invocations on the same UDAF but different
    - fix range based scanner and add range based window tests
    - - set the first Windowing clause encountered in a UDAF invocation as the
    - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
    - Account for empty partitions while closing the PTFOperator
    - support different literal types in constant expressions in select list
    - fix merge issues due to CRLF
    - add tests for Partition & Order specs specified
    - rules for inferring the default Partitioning Spec.
    - To avoid losing changes due to CRLF issue
    - Do not process queries with constants in select and let Hive handle them
    - redo check if a SelectList constitutes a valid GBy query.
    - Resolve merge conflicts with apache_hive
    - Compare expressions trees using toStringTree() in translateOrder
    - for distinct queries filter out exprs handled by windowing
    - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
    - when validating a SelectList for GBy, account for FUNCTION_DI tokens
    - add test for select distinct + windowing
    - get rid of WindowingException;
    - cleanup ptf.utils.Utils and ptf.query.SerializationUtils
    - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
    - fix merge issue: we already had the methods in ASTNode for antlr3.4
    - clean ptf.ds package; move code to PTFPersistence class
    - Fix negative tests with new messages (WindowingException removed)
    - add a way to disambiguate between sort expressions and fn. args
    - expose the PTFInfo via the PTFDef; so the RR can be used for
    - change ptf invocation so args come before table and partitioning spec.
    - get rid of ptf.Constants class; add new ConfVars
    - Support multi-operator function chain + tests
    - fix query componentization logic.
    - initial checkin of reviving NPath.
    - Add more tests for query componentization
    - undo inadvertant change made to .gitignore
    - change to PTF ifc. A TableFunc is now also responsible for the names of
    - finish NPath
    - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf
    - Cleanup: remove unused methods in PTFTranslator, remove equals methods
    - merge with hive
    - fix merge issue in SemanticAnalyzer
    - Fix bug: Allow partitioning spec for functions that do not support windowing
    - remove dependency on SemanticAnalyzer in PTFTranslator.
    - add OperatorType and support PTFOperator for explain plan
    - Merge branch 'ptf' of github.com:hbutani/hive into ptf
    - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
    - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf
    - Resolve Merge conflicts with apache_hive/ptf_windowing
    - refactored spec classes
    - refactored PTFDesc; for now called PTFDesc2
    - refactored translation of PTF Chain.
    - refactor Windowing translation
    - simplify RowResolver creation.
    - refactored ptfDesc deserializer
    - refactor SemanticAnalyzer:
    - refactor PTFOp and function classes to use new data structs.
    - When executing the WdwTblFunc:
    - the first Arg of a Lead/Lag function can refer to windowingFns,
    - construct PTF RR using alias specified with PTF invocation
    - semAly windowingSpec was not being set on the current QB
    - For windowing set the OI for output from Wdw Processing to exactly as
    - when setting default PartSpec in WdwTabFn don't clear default Order
    - add Windowing Exprs to qbp.destToWindowingExprs; used to filter out
    - setHasWindowing not set in moveaggregationExprsToWindowingSpec
    - commit with hive
    - rename ranking functions in PTFTrans2
    - changes to ptf_general_queries.q
    - fix sort by in queries 62, 63
    - remove suffix 2 from new classes
    - add apache license comment to new classes.
    - Change get/settFunction to get/setTFunction in PTFDesc
    - Fix range condition in PTFPartition: without this change the tests do not run from CLI
    - Modify rc_file and seq_file tests to specify dist/sort condition with windowing
    - fix Deserilizer bugs
    - fix npath deserialization
    - Merge remote-tracking branch 'remotes/hive/ptf-windowing' into ptf
    - remove dependency on FuncRegistry in PTFDeserializer for PTFs
    - Add fix for constructing Extract Operator RR during windowing plan generation + Add
test 50 to ptf_general_queries.q
    - allow wdw fn refernces in wdw expressions
    - rewrite .q and .out file
    - add apache license
    - apache license headers
    - rebuild RR for Noop/NoopMap even when no there is no alias. Input's
    - add comments to PTFOp, WdwFuncDesc
    - fix lint issues
    - Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf
    - Fix negative tests
    - Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf
    - Column Pruner support for PTFOperator: For HIVE-4035
    - minor changes and document PTF ColumnPruner
    - Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf
    - Remove setting the hive.ptf.partition.persistence.memsize in ptf_general_queries.q
    - Merge branch 'ptf' of github.com:hbutani/hive into ptf
    - merge with hive
    - Resolve merge conflicts with apache_hive/ptf-windowing
    - Add new tests on Lead to ptf_general_queries.q + fix names
    - Resolve Merge conflicts with ptf-windowing + Renumber last 3 test cases
    - Resolve newline issues
    - Merge remote-tracking branch 'remotes/hive/ptf-windowing' into ptf
    - Merge remote-tracking branch 'hive/ptf-windowing' into ptf
    - HIVE-4082: Refactor tests
    - Resolve merge issues after merge with ptf-windowing
    - Resolve merge conflicts with ptf-windowing
    - Merge remote-tracking branch 'apache_hive/ptf-windowing' into ptf
    - Merge remote-tracking branch 'apache_hive/ptf-windowing' into ptf
    - Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf
    - HIVE-4126 [jira] remove support for lead/lag UDFs outside of UDAF args

Reviewers: JIRA, ashutoshc

REVISION DETAIL
  https://reviews.facebook.net/D9105

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D9105?vs=29205&id=29415#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  data/files/flights_tiny.txt
  data/files/part.rc
  data/files/part.seq
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/queries/clientpositive/leadlag.q
  ql/src/test/queries/clientpositive/leadlag_queries.q
  ql/src/test/queries/clientpositive/ptf.q
  ql/src/test/queries/clientpositive/windowing.q
  ql/src/test/queries/clientpositive/windowing_expressions.q
  ql/src/test/results/clientpositive/leadlag.q.out
  ql/src/test/results/clientpositive/leadlag_queries.q.out
  ql/src/test/results/clientpositive/ptf.q.out
  ql/src/test/results/clientpositive/windowing.q.out
  ql/src/test/results/clientpositive/windowing_expressions.q.out

To: JIRA, ashutoshc, hbutani

                
> remove support for lead/lag UDFs outside of UDAF args
> -----------------------------------------------------
>
>                 Key: HIVE-4126
>                 URL: https://issues.apache.org/jira/browse/HIVE-4126
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>            Reporter: Harish Butani
>            Assignee: Harish Butani
>         Attachments: HIVE-4126.D9105.1.patch, HIVE-4126.D9105.2.patch
>
>
> Select Expressions such as 
> p_size - lead(p_size,1)
> are currently handled as non aggregation expressions done after all over clauses are
evaluated.
> Once we allow different partitions in a single select list(Jira 4041), these become ambiguous.

> - the equivalent way to do such things is either to use lead/lag UDAFs with expressions
( support added with Jira 4081)
> - stack windowing clauses with inline queries. select lead(r,1).. from (select rank()
as r....)...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message