drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
Date Sat, 09 Sep 2017 04:54:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159638#comment-16159638
] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 4:53 AM:
------------------------------------------------------------

Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial allocation is good;
we didn't resize vectors as we wrote. However, each batch consumed 10,566,656 bytes of memory,
much more than the 7 MB expected. The gross memory used is 105,667 bytes per row, larger than
the 84,413 expected.

>From the file summary:

{code}
Summary: Wrote 246281737 bytes to ...
Spilled 52 output batches, each of 10566656 bytes, 100 records
{code}

>From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This number has
no internal fragmentation and so should match our "net" record size estimate. Our net estimate
is 48,101, so we're pretty close. The error should be explained, but our estimate is conservative,
and so is safe for memory calcs.


was (Author: paul.rogers):
Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial allocation is good;
we didn't resize vectors as we wrote. However, each batch consumed 10,566,656 bytes of memory,
much more than the 7 MB expected. The gross memory used is 105,667 bytes per row, larger than
the 84,413 expected.

>From the file summary:

{code}
Summary: Wrote 246281737 bytes to ...
Spilled 52 output batches, each of 10566656 bytes, 100 records
{code}

>From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This number has
no internal fragmentation and so should match our "net" record size estimate. Our net estimate
is 48,101, so we're pretty close. The error should be explained, but our estimate is conservative,
and so is safe for memory calcs.

The question is, how does the 4.6 MB per batch balloon to 16 MB or more on read?

> Varchar vector throws an assertion error when allocating a new vector
> ---------------------------------------------------------------------
>
>                 Key: DRILL-5670
>                 URL: https://issues.apache.org/jira/browse/DRILL-5670
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.11.0
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>         Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill,
269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, drillbit.log,
drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's repository|https://github.com/paul-rogers/drill].
Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=progers@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an improvements\n\n-
DRILL-5513\: Managed External Sort \: OOM error during the merge phase\n- DRILL-5519\: Sort
fails to spill and results in an OOM\n- DRILL-5522\: OOM during the merge and spill process
of the managed external sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase
of external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable vectors allocateNew()\n-
DRILL-5602\: Repeated List Vector fails to initialize the offset vector\n\nAll of the bugs
have to do with handling low-memory conditions, and with\ncorrectly estimating the sizes of
vectors, even when those vectors come\nfrom the spill file or from an exchange. Hence, the
changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallirahul@gmail.com
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> git.remote.origin.url=git@github.com\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET `exec.sort.disable_managed`
= false;
> +-------+-------------------------------------+
> |  ok   |               summary               |
> +-------+-------------------------------------+
> | true  | exec.sort.disable_managed updated.  |
> +-------+-------------------------------------+
> 1 row selected (1.044 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set `planner.memory.max_query_memory_per_node`
= 482344960;
> +-------+----------------------------------------------------+
> |  ok   |                      summary                       |
> +-------+----------------------------------------------------+
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +-------+----------------------------------------------------+
> 1 row selected (0.372 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set `planner.width.max_per_node`
= 1;
> +-------+--------------------------------------+
> |  ok   |               summary                |
> +-------+--------------------------------------+
> | true  | planner.width.max_per_node updated.  |
> +-------+--------------------------------------+
> 1 row selected (0.292 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set `planner.width.max_per_query`
= 1;
> +-------+---------------------------------------+
> |  ok   |                summary                |
> +-------+---------------------------------------+
> | true  | planner.width.max_per_query updated.  |
> +-------+---------------------------------------+
> 1 row selected (0.25 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from dfs.`/drill/testdata/resource-manager/3500cols.tbl`
order by columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[2222],columns[30],columns[2420],columns[1520],
columns[1410], columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
columns[3333],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
columns[3210] ) d where d.col433 = 'sjka skjf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to disk
> Fragment 2:0
> [Error Id: 26b55576-1a5c-4756-96d6-fbec25eecf03 on qa-node190.qa.lab:31010]
>   (java.lang.AssertionError) null
>     org.apache.drill.exec.vector.VarCharVector.allocateNew():400
>     org.apache.drill.exec.vector.RepeatedVarCharVector.allocateNew():272
>     org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():37
>     org.apache.drill.exec.vector.AllocationHelper.allocate():44
>     org.apache.drill.exec.record.SmartAllocationHelper.allocateVector():117
>     org.apache.drill.exec.record.SmartAllocationHelper.allocateVector():97
>     org.apache.drill.exec.record.SmartAllocationHelper.allocateBatch():87
>     org.apache.drill.exec.physical.impl.xsort.managed.PriorityQueueCopierWrapper$BatchMerger.next():262
>     org.apache.drill.exec.physical.impl.xsort.managed.SpilledRuns.doMergeAndSpill():171
>     org.apache.drill.exec.physical.impl.xsort.managed.SpilledRuns.safeMergeAndSpill():138
>     org.apache.drill.exec.physical.impl.xsort.managed.SpilledRuns.mergeAndSpill():91
>     org.apache.drill.exec.physical.impl.xsort.managed.SortImpl.spillFromMemory():298
>     org.apache.drill.exec.physical.impl.xsort.managed.SortImpl.addBatch():191
>     org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():402
>     org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():338
>     org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():290
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():225
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():225
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745 (state=,code=0)
> {code}
> The data set is too large to attach here. Logs and profiles are attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message