drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau" <jacques.dr...@gmail.com>
Subject Re: Review Request 30051: DRILL-1908: new window function implementation
Date Mon, 26 Jan 2015 17:51:42 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69617
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
<https://reviews.apache.org/r/30051/#comment114332>

    This should only be in sql parsing, not here.  Please check with Aman & Sean in how
they are disabling SQL functionality and then do the same for this (based on option).



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
<https://reviews.apache.org/r/30051/#comment114361>

    You should specifically note that this doesn't cover the situation where we have multiple
distinct partitions.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
<https://reviews.apache.org/r/30051/#comment114334>

    This logic seems a little backwards from what I was expecting.  Batches are physical concept.
 Shouldn't these be stated based on p0.  For example, it seems like we should be able to process
p0 the moment we have received b0.  We can't do p1 until we get to b2 and see that p1 has
ended, etc.  As you've written it sounds like we can't do p0 until we know the end of p1.
 I guess this might make sense if we have small partitions and you're trying to work a batch
at a time inside the generated code.  This sacrifices more memory for potentially better performance.
 Is this the reason you're operating this way?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
<https://reviews.apache.org/r/30051/#comment114338>

    Given that you're not handling schema change, it would be best to fail rather than return
incorrect result.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
<https://reviews.apache.org/r/30051/#comment114339>

    We should save the failure state in the fragment context here.  I think the method is
fail(Exception e) or similar.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
<https://reviews.apache.org/r/30051/#comment114341>

    Let's discuss this.  I think there is a disconnect here.



exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
<https://reviews.apache.org/r/30051/#comment114362>

    Please add a negative test case when using multiple partitions.  In that case, the failure
should happen in the planning phase, not execution.


- Jacques Nadeau


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
>     https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> In order to fix DRILL-1487 a complete rewrite of the StreamingWindowFrameRecordBatch
was needed. This patch adds a new WindowFrameRecordBatch that correctly handles window functions
with or without order by clauses.
> This code still lacks support for frame clauses and may be optimized to reduce unneeded
frame computations.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 28424a5

>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 5288f5d

>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java 17738ee

>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
9b8929f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
a3e7940 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
b4e3fed 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
9588cef 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
f1a8bc0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
00c20b2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
f20627d 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
7c04477 
>   exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 6eff6db

> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> -------
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch and has
rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an edge case:
the 2nd time innerNext() is called, WindowFrameRecordBatch has enough saved batches to call
it's framer.doWork() without the need to call next(incoming)
> 
> All tests, except the last one, come in 2 variations: with and without "order by" clause
> 
> 
> Thanks,
> 
> abdelhakim deneche
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message