quickstep-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jianqiao <...@git.apache.org>
Subject [GitHub] incubator-quickstep pull request #122: Add backend support for LIPFilters.
Date Mon, 24 Oct 2016 07:32:21 GMT
GitHub user jianqiao opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/122

    Add backend support for LIPFilters.

    This PR follows #113 and #118 and adds backend support for LIPFilters.
    - `BuildHashOperator` supports building of LIPFilters.
    - `SelectOperator`, `HashJoinOperator` and `AggregateOperator` support probing of LIPFilters.

    
    For `SelectOperator` and `AggregateOperator`, if an filter predicate is present, then
the LIPFilters will be applied AFTER the filter predicate.
    
    Here are the performance results for SSB SF100 and TPC-H SF100.
    <table>
      <tr>
        <td><b>SSB SF100</b></td>
        <td><b>master (ms)</b></td>
        <td><b>w/ LIPFilter (ms)</b></td>
      </tr>
      <tr>
        <td>Q01</td>
        <td>885</td>
        <td>955</td>
      </tr>
      <tr>
        <td>Q02</td>
        <td>738</td>
        <td>821</td>
      </tr>
      <tr>
        <td>Q03</td>
        <td>707</td>
        <td>835</td>
      </tr>
      <tr>
        <td>Q04</td>
        <td>1240</td>
        <td>1114</td>
      </tr>
      <tr>
        <td>Q05</td>
        <td>853</td>
        <td>835</td>
      </tr>
      <tr>
        <td>Q06</td>
        <td>751</td>
        <td>975</td>
      </tr>
      <tr>
        <td>Q07</td>
        <td>3109</td>
        <td>2116</td>
      </tr>
      <tr>
        <td>Q08</td>
        <td>1042</td>
        <td>581</td>
      </tr>
      <tr>
        <td>Q09</td>
        <td>786</td>
        <td>710</td>
      </tr>
      <tr>
        <td>Q10</td>
        <td>603</td>
        <td>558</td>
      </tr>
      <tr>
        <td>Q11</td>
        <td>2851</td>
        <td>1410</td>
      </tr>
      <tr>
        <td>Q12</td>
        <td>3279</td>
        <td>908</td>
      </tr>
      <tr>
        <td>Q13</td>
        <td>1122</td>
        <td>904</td>
      </tr>
      <tr>
        <td>Total</td>
        <td>17967</td>
        <td>12721</td>
      </tr>
    </table>
    
    For TPC-H queries, there is one issue with Q21 that two hash tables on the `lineitem`
relation are required. Since all the `HashTable`s are constructed in `QueryContext` at the
beginning of query execution, so that 75% of the available memory slots (48569 out of 64385)
are occupied which can not be swapped out by `StorageManager`'s `EvictionPolicy`. This incurs
heavy _spilling_ behavior and results in over 120 seconds running time for Q21 in master branch
/ occasional DNF in LIPFilter branch. One quick solution to bypass this problem is to relax
the buffer pool size (set `-buffer_pool_slots=100000`). For a long term solution, we may
    (1) reduce hash table size by using untyped values;
    (2) delay allocating hash table memory unless it is actually used;
    (3) revise scheduler to be aware of resource requirements.
    
    (**master** branch's performance is from Harshad's experiment #121)
    <table>
      <tr>
        <td><b>TPCH SF100</b></td>
        <td><b>master (ms)</b></td>
        <td><b>w/ LIPFilter (ms)</b></td>
        <td><b>w/ LIPFilter (ms)<br />-buffer_pool_slots=100000</b></td>
      </tr>
      <tr>
        <td>Q01</td>
        <td>16,046</td>
        <td>15180</td>
        <td>15238</td>
      </tr>
      <tr>
        <td>Q02</td>
        <td>5,625</td>
        <td>710</td>
        <td>744</td>
      </tr>
      <tr>
        <td>Q03</td>
        <td>6,861</td>
        <td>5069</td>
        <td>4907</td>
      </tr>
      <tr>
        <td>Q04</td>
        <td>2,662</td>
        <td>2617</td>
        <td>2448</td>
      </tr>
      <tr>
        <td>Q05</td>
        <td>4,364</td>
        <td>5966</td>
        <td>4499</td>
      </tr>
      <tr>
        <td>Q06</td>
        <td>398</td>
        <td>401</td>
        <td>395</td>
      </tr>
      <tr>
        <td>Q07</td>
        <td>23,367</td>
        <td>25836</td>
        <td>24860</td>
      </tr>
      <tr>
        <td>Q08</td>
        <td>3,274</td>
        <td>1714</td>
        <td>1733</td>
      </tr>
      <tr>
        <td>Q09</td>
        <td>10,050</td>
        <td>13707</td>
        <td>7789</td>
      </tr>
      <tr>
        <td>Q10</td>
        <td>15,296</td>
        <td>13038</td>
        <td>12934</td>
      </tr>
      <tr>
        <td>Q11</td>
        <td>2,110</td>
        <td>2344</td>
        <td>2221</td>
      </tr>
      <tr>
        <td>Q12</td>
        <td>1,805</td>
        <td>2049</td>
        <td>1969</td>
      </tr>
      <tr>
        <td>Q13</td>
        <td>34,220</td>
        <td>35116</td>
        <td>34915</td>
      </tr>
      <tr>
        <td>Q14</td>
        <td>771</td>
        <td>942</td>
        <td>852</td>
      </tr>
      <tr>
        <td>Q15</td>
        <td>4,435</td>
        <td>4882</td>
        <td>4832</td>
      </tr>
      <tr>
        <td>Q16</td>
        <td>8,661</td>
        <td>8062</td>
        <td>9522</td>
      </tr>
      <tr>
        <td>Q17</td>
        <td>160,707</td>
        <td>1749</td>
        <td>1684</td>
      </tr>
      <tr>
        <td>Q18</td>
        <td>66,309</td>
        <td>82505</td>
        <td>86376</td>
      </tr>
      <tr>
        <td>Q19</td>
        <td>1,475</td>
        <td>1871</td>
        <td>1515</td>
      </tr>
      <tr>
        <td>Q20</td>
        <td>55,381</td>
        <td>1591</td>
        <td>1491</td>
      </tr>
      <tr>
        <td>Q21</td>
        <td>121,310</td>
        <td>DNF</td>
        <td>13205</td>
      </tr>
      <tr>
        <td>Q22</td>
        <td>6,792</td>
        <td>6746</td>
        <td>7098</td>
      </tr>
      <tr>
        <td></td>
        <td>551,921</td>
        <td>232096 (w/o Q21)</td>
        <td>241228</td>
      </tr>
    </table>
    
    Note that some improvements are not orthogonal to Harshad's partitioned aggregation #121
since LIPFilters also speed up some aggregations. Roughly speaking, when both PRs are merged,
we will have an estimated overall running time of ~150s for TPC-H SF100. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep lip-refactor-backend

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #122
    
----
commit 31b05122f2278a3c1327674795eec71efe8ff452
Author: Jianqiao Zhu <jianqiao@cs.wisc.edu>
Date:   2016-09-07T18:20:43Z

    Add backend support for LIPFilters.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message