impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IMPALA-4674) Port spilling ExecNodes to new buffer pool
Date Thu, 16 Mar 2017 22:48:41 GMT

    [ https://issues.apache.org/jira/browse/IMPALA-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929076#comment-15929076
] 

Tim Armstrong commented on IMPALA-4674:
---------------------------------------


IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

Add a copy of BufferedTupleStream that allocates memory from BufferPool.
This will replace the original implementation when IMPALA-3200 is
completed.

The major changes are:
* Terminology is updated from "blocks" to "pages"
* No small buffer support is needed (hooray!).
* BufferedTupleStream needs to do its own tracking of # of rows per
  page, etc instead of relying on BufferedBlockMgr to do it. A
  wrapper around PageHandle is used.
* Profile counters (unpin, pin, get new block time) are removed -
  similar counters in the BufferPool client are more useful.
* Changed the tuple null indicators so that they are allocated
  before each tuple, rather than in a block at the start of the
  page. This slightly reduces the memory density, but greatly
  simplifies much logic. In particular, it avoids problems with
  RowIdx and larger pages with offsets that don't fit in 16 bits.
* Buffer management of read/write streams uses the new pin-counting
  support to separate pinning of the read and write pages.
  This means that the reservation usage of an unpinned read/write
  stream does not fluctuate and the read/write iterators can always
  advance without requiring additional reservation.

Testing this required some further changes. TestEnv was refactored
so it can set up either BufferPool or BufferedBlockMgr. Some
BufferPool-related state is added to ExecEnv, QueryState and
RuntimeState, but is only created for backend tests that explicitly
create a BufferPool.

The following is left for future work:
* IMPALA-3808 (large row support) is not added. I've added
  TODOs to the code to places that will require changes.
* IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since
  it requires global changes to operators that accumulate memory.

Testing:
All of the BufferedTupleStream unit tests are ported to the new
implementation, except for ones specifically testing the small
buffer functionality.

Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Reviewed-on: http://gerrit.cloudera.org:8080/5811
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins

> Port spilling ExecNodes to new buffer pool
> ------------------------------------------
>
>                 Key: IMPALA-4674
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4674
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.8.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>              Labels: resource-management
>
> Once the buffer pool is functional we need to port the spilling exec nodes to use it
and remove BufferedBlockMgr:
> # PartitionedHashJoinNode
> # PartitionedAggregationNode
> # SortNode
> # AnalyticEvalNode



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message