drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-4723) GC Memory Allocation Issues
Date Tue, 14 Jun 2016 22:09:30 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330729#comment-15330729
] 

Jacques Nadeau edited comment on DRILL-4723 at 6/14/16 10:09 PM:
-----------------------------------------------------------------

I'm pretty sure that a decent chunk of Parquet writing is done on Heap. That memory consumption
would go up as you increase block size since an entire block for each thread must be held
in memory (~1gb x # of threads + any overhead structures). GC death is very hard to manage
gracefully. Especially in a distributed system (since the GC pauses cause loss of cluster
cohesion).  There are ways to combat by increasing Zookeeper timeouts, etc, but it is under
the "gc tuning" concept.  This is why most of Drill doesn't use Java heap for data. I'd guess
that the best way to fix this would be to go through the rest of the write pipeline in Parquet
and support using off-heap data structures, especially in slab allocation. Our initial of
this nature was mostly focused on updating the Parquet readers to use off-heap structures.
Then Drill can gracefully manage out-of-memory.

In terms of workarounds: you'll either need to reduce parallelization or increase heap to
support writing as many concurrent 1gb Parquet files as you are attempting to write.


was (Author: jnadeau):
I'm pretty sure that a decent chunk of Parquet writing is done on Heap. That memory consumption
would go up as you increase block size since an entire block for each thread must be held
in memory (~1gb x # of threads + any overhead structures). GC death is very hard to manage
gracefully. Especially in a distributed system (since the GC pauses cause loss of cluster
cohesion).  There are ways to combat by increasing Zookeeper timeouts, but it is under the
"gc tuning" concept.  This is why most of Drill doesn't use Java heap for data. I'd guess
that the best way to fix this would be to go through the rest of the write pipeline in Parquet
and support using off-heap data structures, especially in slab allocation. Our initial of
this nature was mostly focused on updating the Parquet readers to use off-heap structures.
Then Drill can gracefully manage out-of-memory.

In terms of workarounds: you'll either need to reduce parallelization or increase heap to
support writing as many concurrent 1gb Parquet files as you are attempting to write.

> GC Memory Allocation Issues
> ---------------------------
>
>                 Key: DRILL-4723
>                 URL: https://issues.apache.org/jira/browse/DRILL-4723
>             Project: Apache Drill
>          Issue Type: Bug
>          Components:  Server
>    Affects Versions: 1.6.0
>         Environment: 5 Drill bits, MapR 1.6 release. 84GB of Direct Memory, 24GB of Heap
>            Reporter: John Omernik
>              Labels: memory, stability
>             Fix For: Future
>
>         Attachments: node3.log
>
>
> This issue is reposted from the Drill User List.  More logs available on request in comments
(please specify which logs you may want)
> High level: Drill Cluster gets into a bad state when this occurs. 
> This is what I have thus far... I can provide more complete logs on a one on one basis.

> The cluster was completely mine, with fresh logs. I ran a CTAS query on a large table
that over 100 fields. This query works well in other cases, however I was working with the
Block size, both in MapR FS and Drill Parquet. I had successfully tested 512m on each, this
case was different. Here are the facts in this setup:
> - No Compression in MapRFS - Using Standard Parquet Snappy Compression
> - MapR Block Size 1024m
> - Parquet Block size 1024m
> - Query  ends up disappearing in the profiles
> - The UI page listing bits only show 4 bits however 5 are running (node 03 process is
running, but no longer in the bit)
> - Error (copied below)  from rest API
> - No output in STD out or STD error on node3. Only two nodes actually had "Parquet Writing"
logs. The other three on Stdout, did not have any issues/errors/
> - I have standard log files, gclogs, the profile.json (before it disappeared), and the
physical plan.  Only some components that looked possibly at issue included here
> - The Node 3 GC log shows a bunch of "Full GC Allocation Failures"  that take 4 seconds
or more (When I've seen this in other cases, this time can balloon to 8 secs or more)
> - The node 3 output log show some issues with really long RPC issues. Perhaps the GCs
prevent RPC communication and create a snowball loop effect?
> Other logs if people are interested can be provided upon request. I just didn't want
to flood the whole list with all the logs. 
> Thanks!
> John
> Rest Error:
> ./load_day.py 2016-05-09
> Drill Rest Endpoint: https://drillprod.marathonprod.zeta:20000
> Day: 2016-05-09
> /usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: InsecureRequestWarning:
Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
See: https://urllib3.readthedocs.org/en/latest/security.html
>   InsecureRequestWarning)
> Authentication successful
> Error encountered: 500
> {
>   "errorMessage" : "SYSTEM ERROR: ForemanException: One more more nodes lost connectivity
during query.  Identified nodes were [atl1ctuzeta03:20001].\n\n\n[Error Id: d7dd0120-f7c0-44ef-ac54-29c746b26354
on atl1ctuzeta01:20001"
> }
> Possible issue in Node3 Log:
> 2016-06-14 17:25:27,860 [289fc208-7266-6a81-73a1-709efff6c412:frag:1:90] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 289fc208-7266-6a81-73a1-709efff6c412:1:90: State to report: RUNNING
> 2016-06-14 17:25:27,871 [289fc208-7266-6a81-73a1-709efff6c412:frag:1:70] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 289fc208-7266-6a81-73a1-709efff6c412:1:70: State change requested AWAITING_ALLOCATION -->
RUNNING
> 2016-06-14 17:25:27,871 [289fc208-7266-6a81-73a1-709efff6c412:frag:1:70] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 289fc208-7266-6a81-73a1-709efff6c412:1:70: State to report: RUNNING
> 2016-06-14 17:43:41,869 [BitServer-4] WARN  o.a.d.exec.rpc.control.ControlClient - Message
of mode RESPONSE of rpc type 1 took longer than 500ms.  Actual duration was 4192ms.
> 2016-06-14 17:45:36,720 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 289fc208-7266-6a81-73a1-709efff6c412:1:0: State change requested RUNNING --> CANCELLATION_REQUESTED
> 2016-06-14 17:45:45,740 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 289fc208-7266-6a81-73a1-709efff6c412:1:0: State to report: CANCELLATION_REQUESTED
> 2016-06-14 17:46:15,318 [BitServer-3] WARN  o.a.d.exec.rpc.control.ControlServer - Message
of mode REQUEST of rpc type 6 took longer than 500ms.  Actual duration was 55328ms.
> 2016-06-14 17:46:36,057 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 289fc208-7266-6a81-73a1-709efff6c412:1:5: State change requested RUNNING --> CANCELLATION_REQUESTED
> 2016-06-14 17:46:44,620 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 289fc208-7266-6a81-73a1-709efff6c412:1:5: State to report: CANCELLATION_REQUESTED
> 2016-06-14 17:47:01,393 [BitServer-3] WARN  o.a.d.exec.rpc.control.ControlServer - Message
of mode REQUEST of rpc type 6 took longer than 500ms.  Actual duration was 29781ms.
> 2016-06-14 17:47:09,463 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 289fc208-7266-6a81-73a1-709efff6c412:1:10: State change requested RUNNING --> CANCELLATION_REQUESTED
> 2016-06-14 17:47:26,967 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 289fc208-7266-6a81-73a1-709efff6c412:1:10: State to report: CANCELLATION_REQUESTED
> 2016-06-14 17:47:55,593 [BitServer-3] WARN  o.a.d.exec.rpc.control.ControlServer - Message
of mode REQUEST of rpc type 6 took longer than 500ms.  Actual duration was 46130ms.
> 2016-06-14 17:48:04,497 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 289fc208-7266-6a81-73a1-709efff6c412:1:15: State change requested RUNNING --> CANCELLATION_REQUESTED
> 2016-06-14 17:48:12,742 [BitServer-4] WARN  o.a.d.exec.rpc.control.ControlClient - Message
of mode RESPONSE of rpc type 1 took longer than 500ms.  Actual duration was 4236ms.
> 2016-06-14 17:48:42,328 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 289fc208-7266-6a81-73a1-709efff6c412:1:15: State to report: CANCELLATION_REQUESTED
> 2016-06-14 17:49:36,351 [BitServer-4] WARN  o.a.d.exec.rpc.control.ControlClient - Message
of mode RESPONSE of rpc type 1 took longer than 500ms.  Actual duration was 4260ms.
> 2016-06-14 17:49:36,351 [BitServer-3] WARN  o.a.d.exec.rpc.control.ControlServer - Message
of mode REQUEST of rpc type 6 took longer than 500ms.  Actual duration was 91854ms.
> 2016-06-14 17:50:35,273 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 289fc208-7266-6a81-73a1-709efff6c412:1:20: State change requested RUNNING --> CANCELLATION_REQUESTED
> 2016-06-14 17:50:39,322 [289fc208-7266-6a81-73a1-709efff6c412:frag:1:15] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 289fc208-7266-6a81-73a1-709efff6c412:1:15: State change requested CANCELLATION_REQUESTED
--> FAILED
> 2016-06-14 17:50:51,546 [CONTROL-rpc-event-queue] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 289fc208-7266-6a81-73a1-709efff6c412:1:20: State to report: CANCELLATION_REQUESTED
> 2016-06-14 17:51:36,905 [BitServer-4] WARN  o.a.d.exec.rpc.control.ControlClient - Message
of mode RESPONSE of rpc type 1 took longer than 500ms.  Actual duration was 4426ms.
> 2016-06-14 17:52:30,805 [BitServer-3] WARN  o.a.d.exec.rpc.control.ControlServer - Message
of mode REQUEST of rpc type 6 took longer than 500ms.  Actual duration was 98767ms.
> 2016-06-14 17:52:47,042 [BitServer-4] WARN  o.a.d.exec.rpc.control.ControlClient - Message
of mode RESPONSE of rpc type 1 took longer than 500ms.  Actual duration was 12041ms.
> Possible issues in Node3 gclog:
> 1819.137: [Full GC (Allocation Failure)  23G->23G(24G), 4.0657064 secs]
> 1823.205: [GC concurrent-mark-abort]
> 1823.221: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G), 0.0382934 secs]
> 1823.271: [GC pause (G1 Evacuation Pause) (young) (initial-mark) 23G->23G(24G), 0.0250215
secs]
> 1823.296: [GC concurrent-root-region-scan-start]
> 1823.296: [GC concurrent-root-region-scan-end, 0.0000105 secs]
> 1823.296: [GC concurrent-mark-start]
> 1823.308: [Full GC (Allocation Failure)  23G->23G(24G), 4.3719713 secs]
> 1827.694: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G), 0.0312666 secs]
> 1827.727: [GC concurrent-mark-abort]
> 1827.735: [GC pause (G1 Evacuation Pause) (young) 23G->23G(24G), 0.0212359 secs]
> 1827.766: [Full GC (Allocation Failure)  23G->23G(24G), 3.9308980 secs]
> 1831.710: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G), 0.0298229 secs]
> 1831.750: [GC pause (G1 Evacuation Pause) (young) (initial-mark) 23G->23G(24G), 0.0268410
secs]
> 1831.777: [GC concurrent-root-region-scan-start]
> 1831.777: [GC concurrent-root-region-scan-end, 0.0000212 secs]
> 1831.777: [GC concurrent-mark-start]
> 1831.789: [Full GC (Allocation Failure)  23G->23G(24G), 3.9250410 secs]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message