hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harish Butani" <rhbut...@gmail.com>
Subject Re: Review Request 21970: HIVE-7062: Support Streaming mode in Windowing
Date Thu, 29 May 2014 01:00:23 GMT


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java, line 65
> > <https://reviews.apache.org/r/21970/diff/1/?file=597243#file597243line65>
> >
> >     Seems like there is no call of this constructor with createElemContainer = false.
So, is this new boolean required ?

RollingPartition creation invokes this with false; so that underlying RowContainer is not
allocated.


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java, line
168
> > <https://reviews.apache.org/r/21970/diff/1/?file=597245#file597245line168>
> >
> >     Good to add comment that range based windows not supported yet.

done


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java, line
172
> > <https://reviews.apache.org/r/21970/diff/1/?file=597245#file597245line172>
> >
> >     Add comment that unbounded windows not supported yet.

done


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java, line 87
> > <https://reviews.apache.org/r/21970/diff/1/?file=597250#file597250line87>
> >
> >     Do we need this boolean? All ranking functions can support streaming.. isn't
it?

No CumeDist and PercentRank don't support streaming
Also supportsStreaming is used by control how rank computation is done.
In case of Rank/DenseRank there is no separate class for Streaming. Same class handles both
modes;
flag is used to distinguish how ranking is achieved.


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java,
line 31
> > <https://reviews.apache.org/r/21970/diff/1/?file=597251#file597251line31>
> >
> >     Better name : GenericUDAFStreamingEvaluator ?

as we discussed Enhancer because it provides Streaming mode to any Evaluator (provided incremental
aggregations can be done based on cumulative aggregates)


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java,
line 52
> > <https://reviews.apache.org/r/21970/diff/1/?file=597251#file597251line52>
> >
> >     This can always be computed using preceding + following, so we can get rid of
it.

done


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java,
line 78
> > <https://reviews.apache.org/r/21970/diff/1/?file=597251#file597251line78>
> >
> >     Can you add a comment for this? especially last factor of wSize * underlying.

done. (the formula was wrong)


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java, line
1225
> > <https://reviews.apache.org/r/21970/diff/1/?file=597257#file597257line1225>
> >
> >     Better name : funcArgs?

done


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java, line 66
> > <https://reviews.apache.org/r/21970/diff/1/?file=597244#file597244line66>
> >
> >     I think we can do new ArrayList(precedingSpan + followingSpan). Also, since
we currently support only fixed size windows, this could even be an array.

initialized arraylist
yes using array is possible, not doing here though


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java, line 53
> > <https://reviews.apache.org/r/21970/diff/1/?file=597244#file597244line53>
> >
> >     For more clarity.. it will help to show these four indices using ASCII art,
like following :
> >     
> >     --------------------------------------
> >            |      |         |         |
> >

done


- Harish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/#review44178
-----------------------------------------------------------


On May 28, 2014, 5:40 a.m., Harish Butani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21970/
> -----------------------------------------------------------
> 
> (Updated May 28, 2014, 5:40 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-7062
>     https://issues.apache.org/jira/browse/HIVE-7062
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> 1. Have the Windowing Table Function support streaming mode.
> 2. Have special handling for Ranking UDAFs.
> 3. Have special handling for Sum/Avg for fixed size Wdws.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d

>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8

>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b

>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922

>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624 
>   ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION 
>   ql/src/test/results/clientpositive/ptf.q.out eb4997d 
>   ql/src/test/results/clientpositive/windowing.q.out 7e23497 
>   ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c 
> 
> Diff: https://reviews.apache.org/r/21970/diff/
> 
> 
> Testing
> -------
> 
> run existing windowing and ptf tests
> Add unit tests for StreamingSum and StreamingAvg evaluators.
> 
> 
> Thanks,
> 
> Harish Butani
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message