drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5601) Rollup of External Sort memory management fixes
Date Thu, 13 Jul 2017 22:21:01 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086509#comment-16086509
] 

ASF GitHub Bot commented on DRILL-5601:
---------------------------------------

Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/860#discussion_r127344297
  
    --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java ---
    @@ -247,27 +249,26 @@ public void copyEntry(int toIndex, ValueVector from, int fromIndex)
{
       }
     
       @Override
    -  public int getAllocatedByteCount() {
    -    return offsetVector.getAllocatedByteCount() + super.getAllocatedByteCount();
    +  public void getLedgers(Set<BufferLedger> ledgers) {
    +    offsetVector.getLedgers(ledgers);
    +    super.getLedgers(ledgers);
       }
     
       @Override
    -  public int getPayloadByteCount() {
    -    UInt${type.width}Vector.Accessor a = offsetVector.getAccessor();
    -    int count = a.getValueCount();
    -    if (count == 0) {
    +  public int getPayloadByteCount(int valueCount) {
    +    if (valueCount == 0) {
           return 0;
    -    } else {
    -      // If 1 or more values, then the last value is set to
    -      // the offset of the next value, which is the same as
    -      // the length of existing values.
    -      // In addition to the actual data bytes, we must also
    -      // include the "overhead" bytes: the offset vector entries
    -      // that accompany each column value. Thus, total payload
    -      // size is consumed text bytes + consumed offset vector
    -      // bytes.
    -      return a.get(count-1) + offsetVector.getPayloadByteCount();
         }
    +    // If 1 or more values, then the last value is set to
    +    // the offset of the next value, which is the same as
    +    // the length of existing values.
    +    // In addition to the actual data bytes, we must also
    +    // include the "overhead" bytes: the offset vector entries
    +    // that accompany each column value. Thus, total payload
    +    // size is consumed text bytes + consumed offset vector
    +    // bytes.
    +    return offsetVector.getAccessor().get(valueCount) +
    --- End diff --
    
    Should this be **(valueCount - 1)** ??



> Rollup of External Sort memory management fixes
> -----------------------------------------------
>
>                 Key: DRILL-5601
>                 URL: https://issues.apache.org/jira/browse/DRILL-5601
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> Rollup of a set of specific JIRA entries that all relate to the very difficult problem
of managing memory within Drill in order for the external sort to stay within a memory budget.
In general, the fixes relate to better estimating memory used by the three ways that Drill
allocates vector memory (see DRILL-5522) and to predicting the size of vectors that the sort
will create, to avoid repeated realloc-copy cycles (see DRILL-5594).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message