drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5760) Performance hit: SVR causes repeated vector reallocations
Date Fri, 01 Sep 2017 01:01:38 GMT
Paul Rogers created DRILL-5760:
----------------------------------

             Summary: Performance hit: SVR causes repeated vector reallocations
                 Key: DRILL-5760
                 URL: https://issues.apache.org/jira/browse/DRILL-5760
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Priority: Minor


Run the query in DRILL-5753 with DEBUG logging enabled. You will see a set of vector reallocations
out of the JSON reader as described by DRILL-5759.

Later, the sort in the query will complete with an in-memory sort. Data will be sent downstream
to a selection vector remover (SVR). The SVR will fire a very large number of additional vector
reallocations:

{code}
VarCharVector - Reallocating VarChar, new size 65536
VarCharVector - Reallocating VarChar, new size 65536
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [131072] ->
[262144]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [131072] ->
[262144]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] -> [65536]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] -> [65536]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] -> [65536]
BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [262144] -> [524288]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] -> [65536]
Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [262144] -> [524288]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [65536] -> [131072]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [65536] -> [131072]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [65536] -> [131072]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [65536] -> [131072]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [16384] -> [32768]
VarCharVector - Reallocating VarChar, new size 65536
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [16384] -> [32768]
BigIntVector - Reallocating vector [col1(BIGINT:OPTIONAL)]. # of bytes: [131072] -> [262144]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [16384] -> [32768]
BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [524288] -> [1048576]
VarCharVector - Reallocating VarChar, new size 65536
VarCharVector - Reallocating VarChar, new size 131072
VarCharVector - Reallocating VarChar, new size 131072
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [262144] ->
[524288]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [262144] ->
[524288]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536] -> [131072]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536] -> [131072]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536] -> [131072]
BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [524288] -> [1048576]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536] -> [131072]
Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [524288] -> [1048576]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [131072] -> [262144]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [131072] -> [262144]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [131072] -> [262144]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [131072] -> [262144]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] -> [65536]
VarCharVector - Reallocating VarChar, new size 131072
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] -> [65536]
BigIntVector - Reallocating vector [col1(BIGINT:OPTIONAL)]. # of bytes: [262144] -> [524288]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] -> [65536]
BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [1048576] ->
[2097152]
VarCharVector - Reallocating VarChar, new size 262144
VarCharVector - Reallocating VarChar, new size 131072
VarCharVector - Reallocating VarChar, new size 262144
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [524288] -> [1048576]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [524288] -> [1048576]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072] -> [262144]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072] -> [262144]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072] -> [262144]
BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [1048576] -> [2097152]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072] -> [262144]
Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [1048576] -> [2097152]
{code}

The likely cause is that the input data has repeated elements and the SVR probably does not
consider repetition when allocating vectors, resulting in multiple allocate-copy-reallocate
cycles that thrash memory and waste time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message