impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IMPALA-5084) Backend support for large rows in Sorter
Date Thu, 16 Mar 2017 22:52:42 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tim Armstrong updated IMPALA-5084:
----------------------------------
    Description: 
See IMPALA-3208 for the context.

Sorter::Run changes:
* We can use a similar approach to that used for BufferedTupleStream as described in IMPALA-5085

Testing:
Needs end-to-end tests exercising all operators with large operators

  was:
We need to ensure that all exec nodes can support rows larger than the default page size.
The default page size will be a query option, so users can always increase that, however minimum
memory requirements will scale proportionally, which makes this less appealing.

We should also add a max_row_size query option that controls the maximum size of rows supported
by operators (at least those that use the reservation mechanism). We should be able to support
large rows with only a single read and write buffer of the max row size. I.e. the minimum
requirement for an operator would be ((min_buffers -2) * default_buffer_size) + 2 * max_row_size.
This requires the following changes to the operators:

BufferedTupleStream changes:
* Rows <= the default page size are written as before
* Rows that don't fit in the default page size get written into a larger page, with one row
per page.
* Upon writing a large row to an unpinned stream, the page is immediately unpinned and we
immediately advance to the next write page, so that the large page is not kept pinned outside
of the AddRow() call.
* We should only be reading from one unpinned stream at a time, so only one large page is
required there.

Sorter::Run changes:
* A similar approach to the above can be used.

Testing:
Needs end-to-end tests exercising all operators with large operators


> Backend support for large rows in Sorter
> ----------------------------------------
>
>                 Key: IMPALA-5084
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5084
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.6.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Minor
>              Labels: resource-management
>
> See IMPALA-3208 for the context.
> Sorter::Run changes:
> * We can use a similar approach to that used for BufferedTupleStream as described in
IMPALA-5085
> Testing:
> Needs end-to-end tests exercising all operators with large operators



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message