hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <>
Subject [jira] [Commented] (HIVE-4041) Support multiple partitionings in a single Query
Date Fri, 15 Mar 2013 17:52:13 GMT


Phabricator commented on HIVE-4041:

ashutoshc has commented on the revision "HIVE-4041 [jira] Support multiple partitionings in
a single Query".

  Some more questions.

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ I see.
I thought with following query I can simulate the same problem even on trunk.
  select 1 from over10k group by 1;

  But this didn't result in NPE and query ran successfully. Is this query good approximation
to simulate this path ? My motivation is somehow to simulate this code path without over clause
and thus expose bug on trunk and fix it there, so we don't need to do this in branch.
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ Hmm. I
think we hold on to the schema for PTFOp way too early in semantic phase. Apart from changes
required here, this holding on to the schema is not playing well with other compile time optimization
which hive does after semantic analysis. Other operators don't do this. I think we need to
spend a bit of time on this. Can you point to me where we hold on to schema in SemanticAnalyzer
and why is it necessary?
  ql/src/java/org/apache/hadoop/hive/ql/parse/ I am fine with doing
it in follow-up. But if possible we should get rid of this. This probably result in runtime
perf impact since I think this will force hadoop secondary sort so that values for a given
key come out sorted. Further, adding extra constraints will lessen the opportunity to do compile
time optimizations like filter push down (see my comments on HIVE-4180).
  ql/src/java/org/apache/hadoop/hive/ql/parse/ It will be good
to define group more concretely. If I am getting this right, this is group of over functions
which has same partitioning. Is that correct ?
  So, a group may have multiple functions associated with it (but all on same partitioning).
So, group -> one PTFOp on which there will be multiple functions working? Or a group implies
multiple PTFOp chained in same reducer one after other each working on their own function.
  ql/src/java/org/apache/hadoop/hive/ql/parse/ Which filter
this is? Is this having clause ? But I thought we already removed support for that. If not,
I think we should. Or this regular where clause. If later, we should not consume other operators
of query in PTOperator.
  ql/src/test/queries/clientpositive/windowing_multipartitioning.q:21 It will be good to add
more tests from the google document which I shared with you. It has multipartitioning tests
towards the end.


To: JIRA, ashutoshc, hbutani

> Support multiple partitionings in a single Query
> ------------------------------------------------
>                 Key: HIVE-4041
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>            Reporter: Harish Butani
>            Assignee: Harish Butani
>         Attachments: HIVE-4041.D9381.1.patch, WindowingComponentization.pdf
> Currently we disallow queries if the partition specifications of all Wdw fns are not
the same. We can relax this by generating multiple PTFOps based on the unique partitionings
in a Query. For partitionings that only differ in sort, we can introduce a sort step in between
PTFOps, which can happen in the same Reduce task.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message