drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-5594) Excessive buffer reallocations during merge phase of external sort
Date Thu, 17 Aug 2017 03:37:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Rogers resolved DRILL-5594.
--------------------------------
    Resolution: Fixed

> Excessive buffer reallocations during merge phase of external sort
> ------------------------------------------------------------------
>
>                 Key: DRILL-5594
>                 URL: https://issues.apache.org/jira/browse/DRILL-5594
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.12.0
>
>
> Consider the log file attached to DRILL-5513. The log shows an excessive number of buffer
reallocations while assembling the merged output of the sort:
> {code}
> 2017-05-15 12:58:46,319 [26e5f7b8-71e8-afca-e72e-fad7be2b2416:frag:5:13] DEBUG o.a.drill.exec.vector.BigIntVector
- Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [32768] -> [65536]
> 2017-05-15 12:58:46,321 [26e5f7b8-71e8-afca-e72e-fad7be2b2416:frag:5:13] DEBUG o.a.drill.exec.vector.UInt4Vector
- Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [16384] -> [32768]
> ...:5:13] DEBUG o.a.drill.exec.vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)].
# of bytes: [16384] -> [32768]
> ...:5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)].
# of bytes: [4096] -> [8192]
> ...:5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)].
# of bytes: [4096] -> [8192]
> ...5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)].
# of bytes: [4096] -> [8192]
> ...5:13] DEBUG o.a.drill.exec.vector.BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)].
# of bytes: [32768] -> [65536]
> ..5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)].
# of bytes: [4096] -> [8192]
> ...frag:5:13] DEBUG o.a.drill.exec.vector.Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)].
# 
> ...
> {code}
> Hundreds of these lines appear. This means that the initial buffer allocation is too
small.
> Given that the merge phase knows the number of rows it will put into the batch, and has
access to the "sizer" information to estimate Varchar widths, the merge phase should predict,
and allocate, the required buffer sizes to avoid repeated reallocation. Each reallocation
requires:
> * Allocate new buffer (which puts memory pressure on the sort's memory)
> * Copy data from old to new buffer
> * Zero-fill the new half (zero-fill is not done on the first allocation, strangely.)
> * Free the unneeded original buffer.
> Since the sort is already slow, the above extra work just makes the problem worse.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message