drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5285) Provide detailed, accurate estimate of size consumed by a record batch
Date Tue, 28 Mar 2017 19:58:41 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945830#comment-15945830

Paul Rogers commented on DRILL-5285:

Primarily an implementation issue. The QA-visible result is that the external sort does not
run out of memory regardless of input record sizes or variations (as long as we stay away
from memory-fragmentation issues.)

> Provide detailed, accurate estimate of size consumed by a record batch
> ----------------------------------------------------------------------
>                 Key: DRILL-5285
>                 URL: https://issues.apache.org/jira/browse/DRILL-5285
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
> DRILL-5080 introduced a {{RecordBatchSizer}} that estimates the space taken by a record
batch and determines batch "density."
> Drill provides a large variety of vectors, each with their own internal structure and
collections of vectors. For example, fixed vectors use just a data vector. Nullable vectors
add an "is set" vector. Variable length vectors add an offset vector. Repeated vectors add
a second offset vector.
> The original {{RecordBatchSizer}} attempted to compute sizes for all these vector types.
But, the complexity got to be out of hand. This ticket requests to simply bite the bullet
and move the calculations into each vector type so that the {{RecordBatchSizer}} can simply
use the results of the calculations.

This message was sent by Atlassian JIRA

View raw message