drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "abdelhakim deneche" <adene...@gmail.com>
Subject Re: Review Request 30051: DRILL-1908: new window function implementation
Date Mon, 26 Jan 2015 18:35:04 GMT


> On Jan. 26, 2015, 5:51 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
line 72
> > <https://reviews.apache.org/r/30051/diff/3/?file=828946#file828946line72>
> >
> >     You should specifically note that this doesn't cover the situation where we
have multiple distinct partitions.

What do you mean by "multiple distinct partitions" ?


> On Jan. 26, 2015, 5:51 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
line 87
> > <https://reviews.apache.org/r/30051/diff/3/?file=828946#file828946line87>
> >
> >     This logic seems a little backwards from what I was expecting.  Batches are
physical concept.  Shouldn't these be stated based on p0.  For example, it seems like we should
be able to process p0 the moment we have received b0.  We can't do p1 until we get to b2 and
see that p1 has ended, etc.  As you've written it sounds like we can't do p0 until we know
the end of p1.  I guess this might make sense if we have small partitions and you're trying
to work a batch at a time inside the generated code.  This sacrifices more memory for potentially
better performance.  Is this the reason you're operating this way?

This comes from the first iterations of the window operator: processing one "full" batch at
a time makes it possible to transfer all incoming vectors into the outgoing container. Once
I started supporting "order by" and window frames, vector transfers could no longer be done
in the general case. The code could be optimized to detect when transfers are possible.

It is possible to process and return partitions as soon as they end, but the remaining rows
of the processed batch need to be kept into memory until they are ready to be processed.


- abdelhakim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69617
-----------------------------------------------------------


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
>     https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> In order to fix DRILL-1487 a complete rewrite of the StreamingWindowFrameRecordBatch
was needed. This patch adds a new WindowFrameRecordBatch that correctly handles window functions
with or without order by clauses.
> This code still lacks support for frame clauses and may be optimized to reduce unneeded
frame computations.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 28424a5

>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 5288f5d

>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java 17738ee

>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
9b8929f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
a3e7940 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
b4e3fed 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
9588cef 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
f1a8bc0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
00c20b2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
f20627d 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
7c04477 
>   exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 6eff6db

> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> -------
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch and has
rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an edge case:
the 2nd time innerNext() is called, WindowFrameRecordBatch has enough saved batches to call
it's framer.doWork() without the need to call next(incoming)
> 
> All tests, except the last one, come in 2 variations: with and without "order by" clause
> 
> 
> Thanks,
> 
> abdelhakim deneche
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message