lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Forest Soup (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9828) Very long young generation stop the world GC pause
Date Thu, 08 Dec 2016 13:53:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732269#comment-15732269
] 

Forest Soup commented on SOLR-9828:
-----------------------------------

Thanks Shawn, 

I'll use this mail thread talking on it instead of this JIRA. 

Could you please help comment on the question in the mail thread? Thanks!

1, As you can see in the gc log, the long GC pause is not a full GC. It's a young generation
GC instead.   
In our case, full gc is fast and young gc got some long stw pause. 
Do you have any comments on that, as we usually believe full gc may cause longer pause, but
young generation should be ok? 

2, Will these JVM options make it better? 
-XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=10 

2016-11-22T20:43:16.463+0000: 2942054.509: Total time for which application threads were stopped:
0.0029195 seconds, Stopping threads took: 0.0000804 seconds 
{Heap before GC invocations=2246 (full 0): 
 garbage-first heap   total 26673152K, used 4683965K [0x00007f0c10000000, 0x00007f0c108065c0,
0x00007f1410000000) 
  region size 8192K, 162 young (1327104K), 17 survivors (139264K) 
 Metaspace       used 56487K, capacity 57092K, committed 58368K, reserved 59392K 
2016-11-22T20:43:16.555+0000: 2942054.602: [GC pause (G1 Evacuation Pause) (young) 
Desired survivor size 88080384 bytes, new threshold 15 (max 15) 
- age   1:   28176280 bytes,   28176280 total 
- age   2:    5632480 bytes,   33808760 total 
- age   3:    9719072 bytes,   43527832 total 
- age   4:    6219408 bytes,   49747240 total 
- age   5:    4465544 bytes,   54212784 total 
- age   6:    3417168 bytes,   57629952 total 
- age   7:    5343072 bytes,   62973024 total 
- age   8:    2784808 bytes,   65757832 total 
- age   9:    6538056 bytes,   72295888 total 
- age  10:    6368016 bytes,   78663904 total 
- age  11:     695216 bytes,   79359120 total 
, 97.2044320 secs] 
   [Parallel Time: 19.8 ms, GC Workers: 18] 
      [GC Worker Start (ms): Min: 2942054602.1, Avg: 2942054604.6, Max: 2942054612.7, Diff:
10.6] 
      [Ext Root Scanning (ms): Min: 0.0, Avg: 2.4, Max: 6.7, Diff: 6.7, Sum: 43.5] 
      [Update RS (ms): Min: 0.0, Avg: 3.0, Max: 15.9, Diff: 15.9, Sum: 54.0] 
         [Processed Buffers: Min: 0, Avg: 10.7, Max: 39, Diff: 39, Sum: 192] 
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6] 
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] 
      [Object Copy (ms): Min: 0.1, Avg: 9.2, Max: 13.4, Diff: 13.3, Sum: 165.9] 
      [Termination (ms): Min: 0.0, Avg: 2.5, Max: 2.7, Diff: 2.7, Sum: 44.1] 
         [Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 27] 
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.6] 
      [GC Worker Total (ms): Min: 9.0, Avg: 17.1, Max: 19.7, Diff: 10.6, Sum: 308.7] 
      [GC Worker End (ms): Min: 2942054621.8, Avg: 2942054621.8, Max: 2942054621.8, Diff:
0.0] 
   [Code Root Fixup: 0.1 ms] 
   [Code Root Purge: 0.0 ms] 
   [Clear CT: 0.2 ms] 
   [Other: 97184.3 ms] 
      [Choose CSet: 0.0 ms] 
      [Ref Proc: 8.5 ms] 
      [Ref Enq: 0.2 ms] 
      [Redirty Cards: 0.2 ms] 
      [Humongous Register: 0.1 ms] 
      [Humongous Reclaim: 0.1 ms] 
      [Free CSet: 0.4 ms] 
   [Eden: 1160.0M(1160.0M)->0.0B(1200.0M) Survivors: 136.0M->168.0M Heap: 4574.2M(25.4G)->3450.8M(26.8G)]

Heap after GC invocations=2247 (full 0): 
 garbage-first heap   total 28049408K, used 3533601K [0x00007f0c10000000, 0x00007f0c10806b00,
0x00007f1410000000) 
  region size 8192K, 21 young (172032K), 21 survivors (172032K) 
 Metaspace       used 56487K, capacity 57092K, committed 58368K, reserved 59392K 
} 
 [Times: user=0.00 sys=94.28, real=97.19 secs] 
2016-11-22T20:44:53.760+0000: 2942151.806: Total time for which application threads were stopped:
97.2053747 seconds, Stopping threads took: 0.0001373 seconds

> Very long young generation stop the world GC pause 
> ---------------------------------------------------
>
>                 Key: SOLR-9828
>                 URL: https://issues.apache.org/jira/browse/SOLR-9828
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.3.2
>         Environment: Linux Redhat 64bit
>            Reporter: Forest Soup
>
> We are using oracle jdk8u92 64bit.
> The jvm memory related options:
> -Xms32768m 
> -Xmx32768m 
> -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/mnt/solrdata1/log 
> -XX:+UseG1GC 
> -XX:+PerfDisableSharedMem 
> -XX:+ParallelRefProcEnabled 
> -XX:G1HeapRegionSize=8m 
> -XX:MaxGCPauseMillis=100 
> -XX:InitiatingHeapOccupancyPercent=35 
> -XX:+AggressiveOpts 
> -XX:+AlwaysPreTouch 
> -XX:ConcGCThreads=16 
> -XX:ParallelGCThreads=18 
> -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/mnt/solrdata1/log 
> -verbose:gc 
> -XX:+PrintHeapAtGC 
> -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps 
> -XX:+PrintGCTimeStamps 
> -XX:+PrintTenuringDistribution 
> -XX:+PrintGCApplicationStoppedTime 
> -Xloggc:/mnt/solrdata1/log/solr_gc.log
> It usually works fine. But recently we met very long stop the world young generation
GC pause. Some snippets of the gc log are as below:
> 2016-11-22T20:43:16.436+0000: 2942054.483: Total time for which application threads were
stopped: 0.0005510 seconds, Stopping threads took: 0.0000894 seconds
> 2016-11-22T20:43:16.463+0000: 2942054.509: Total time for which application threads were
stopped: 0.0029195 seconds, Stopping threads took: 0.0000804 seconds
> {Heap before GC invocations=2246 (full 0):
>  garbage-first heap   total 26673152K, used 4683965K [0x00007f0c10000000, 0x00007f0c108065c0,
0x00007f1410000000)
>   region size 8192K, 162 young (1327104K), 17 survivors (139264K)
>  Metaspace       used 56487K, capacity 57092K, committed 58368K, reserved 59392K
> 2016-11-22T20:43:16.555+0000: 2942054.602: [GC pause (G1 Evacuation Pause) (young)
> Desired survivor size 88080384 bytes, new threshold 15 (max 15)
> - age   1:   28176280 bytes,   28176280 total
> - age   2:    5632480 bytes,   33808760 total
> - age   3:    9719072 bytes,   43527832 total
> - age   4:    6219408 bytes,   49747240 total
> - age   5:    4465544 bytes,   54212784 total
> - age   6:    3417168 bytes,   57629952 total
> - age   7:    5343072 bytes,   62973024 total
> - age   8:    2784808 bytes,   65757832 total
> - age   9:    6538056 bytes,   72295888 total
> - age  10:    6368016 bytes,   78663904 total
> - age  11:     695216 bytes,   79359120 total
> , 97.2044320 secs]
>    [Parallel Time: 19.8 ms, GC Workers: 18]
>       [GC Worker Start (ms): Min: 2942054602.1, Avg: 2942054604.6, Max: 2942054612.7,
Diff: 10.6]
>       [Ext Root Scanning (ms): Min: 0.0, Avg: 2.4, Max: 6.7, Diff: 6.7, Sum: 43.5]
>       [Update RS (ms): Min: 0.0, Avg: 3.0, Max: 15.9, Diff: 15.9, Sum: 54.0]
>          [Processed Buffers: Min: 0, Avg: 10.7, Max: 39, Diff: 39, Sum: 192]
>       [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
>       [Object Copy (ms): Min: 0.1, Avg: 9.2, Max: 13.4, Diff: 13.3, Sum: 165.9]
>       [Termination (ms): Min: 0.0, Avg: 2.5, Max: 2.7, Diff: 2.7, Sum: 44.1]
>          [Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 27]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.6]
>       [GC Worker Total (ms): Min: 9.0, Avg: 17.1, Max: 19.7, Diff: 10.6, Sum: 308.7]
>       [GC Worker End (ms): Min: 2942054621.8, Avg: 2942054621.8, Max: 2942054621.8, Diff:
0.0]
>    [Code Root Fixup: 0.1 ms]
>    [Code Root Purge: 0.0 ms]
>    [Clear CT: 0.2 ms]
>    [Other: 97184.3 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 8.5 ms]
>       [Ref Enq: 0.2 ms]
>       [Redirty Cards: 0.2 ms]
>       [Humongous Register: 0.1 ms]
>       [Humongous Reclaim: 0.1 ms]
>       [Free CSet: 0.4 ms]
>    [Eden: 1160.0M(1160.0M)->0.0B(1200.0M) Survivors: 136.0M->168.0M Heap: 4574.2M(25.4G)->3450.8M(26.8G)]
> Heap after GC invocations=2247 (full 0):
>  garbage-first heap   total 28049408K, used 3533601K [0x00007f0c10000000, 0x00007f0c10806b00,
0x00007f1410000000)
>   region size 8192K, 21 young (172032K), 21 survivors (172032K)
>  Metaspace       used 56487K, capacity 57092K, committed 58368K, reserved 59392K
> }
>  [Times: user=0.00 sys=94.28, real=97.19 secs] 
> 2016-11-22T20:44:53.760+0000: 2942151.806: Total time for which application threads were
stopped: 97.2053747 seconds, Stopping threads took: 0.0001373 seconds
> 2016-11-22T20:44:53.762+0000: 2942151.809: Total time for which application threads were
stopped: 0.0008138 seconds, Stopping threads took: 0.0001258 seconds
> And CPU reached near 100% during the GC.
> The load is normal at that time according to the stats of solr update/select/delete handler
and jetty request log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message