drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4310) Memory leak in hash partition sender when query is cancelled
Date Tue, 26 Jan 2016 03:41:39 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116609#comment-15116609
] 

Deneche A. Hakim commented on DRILL-4310:
-----------------------------------------

Looking at the Foreman's log (133) it seems that the query failed because the RPC connection
between the foreman node and the client timed out, this is what caused the remaining fragments
to be cancelled:
{noformat}
2016-01-26 00:45:16,276 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer - RPC connection
/10.10.88.133:31010 <--> /10.10.88.133:59875 (user client) timed out.  Timeout was set
to 30 seconds. Closing connection.
2016-01-26 00:45:16,278 [UserServer-1] INFO  o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0:
State change requested FAILED --> FAILED
2016-01-26 00:45:16,279 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer - RPC connection
/10.10.88.133:31010 <--> /10.10.88.133:59882 (user client) timed out.  Timeout was set
to 30 seconds. Closing connection.
2016-01-26 00:45:16,280 [UserServer-1] INFO  o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0:
State change requested FAILED --> FAILED
2016-01-26 00:45:16,338 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer - RPC connection
/10.10.88.133:31010 <--> /10.10.88.133:59885 (user client) timed out.  Timeout was set
to 30 seconds. Closing connection.
2016-01-26 00:45:16,340 [UserServer-1] INFO  o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0:
State change requested FAILED --> FAILED
{noformat}

> Memory leak in hash partition sender when query is cancelled
> ------------------------------------------------------------
>
>                 Key: DRILL-4310
>                 URL: https://issues.apache.org/jira/browse/DRILL-4310
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 0.5.0
>            Reporter: Victoria Markman
>         Attachments: 29593ea8-88d2-612e-7c58-aa11652c4072.sys.drill, drillbit.log.133,
drillbit.log.134, drillbit.log.135, drillbit.log.136
>
>
> Query got cancelled (still investigating what caused cancellation).
> Here is an excerpt from drillbit.log
> {code}
> 2016-01-26 00:46:29,627 [29593ea8-88d2-612e-7c58-aa11652c4072:frag:2:2] ERROR o.a.d.e.w.fragment.FragmentExecutor
- SYSTEM ERROR: IllegalStateException: Allocator[op:2:2:0:HashPartitionSender] closed with
outstanding buffers allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 1000000/10240/2140160/10000000000 (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
>     ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size:
4096, references: 1, life: 23697371310917183..0, allocatorManager: [7140397, life: 23697371310913697..0]
holds 1 buffers.
>         DrillBuf[13122380], udle: [7140398 0..4096]
>     ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size:
1024, references: 1, life: 23697371311045504..0, allocatorManager: [7140398, life: 23697371311041789..0]
holds 1 buffers.
>         DrillBuf[13122381], udle: [7140399 0..1024]
>     ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size:
4096, references: 1, life: 23697371310795164..0, allocatorManager: [7140396, life: 23697371310789988..0]
holds 1 buffers.
>         DrillBuf[13122379], udle: [7140397 0..4096]
>     ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size:
1024, references: 1, life: 23697371288488073..0, allocatorManager: [7140275, life: 23697371288484282..0]
holds 1 buffers.
>         DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> Fragment 2:2
> [Error Id: 043c5d25-c4de-4a70-9cb1-d4987822ee3b on atsqa4-134.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalStateException:
Allocator[op:2:2:0:HashPartitionSender] closed with outstanding buffers allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 1000000/10240/2140160/10000000000 (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
>     ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size:
4096, references: 1, life: 23697371310917183..0, allocatorManager: [7140397, life: 23697371310913697..0]
holds 1 buffers.
>         DrillBuf[13122380], udle: [7140398 0..4096]
>     ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size:
1024, references: 1, life: 23697371311045504..0, allocatorManager: [7140398, life: 23697371311041789..0]
holds 1 buffers.
>         DrillBuf[13122381], udle: [7140399 0..1024]
>     ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size:
4096, references: 1, life: 23697371310795164..0, allocatorManager: [7140396, life: 23697371310789988..0]
holds 1 buffers.
>         DrillBuf[13122379], udle: [7140397 0..4096]
>     ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size:
1024, references: 1, life: 23697371288488073..0, allocatorManager: [7140275, life: 23697371288484282..0]
holds 1 buffers.
>         DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> {code}
> Reproduced twice by running: ./run.sh -s Advanced/tpcds/tpcds_sf100/original -g smoke
-t 600 -n 10 -i 100 -m
> Cluster configuration: vanilla, 48GB of memory, 4GB heap.
> Attaching query profile and logs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message