hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-10661) LLAP: investigate why GC with IO elevator disabled is so bad
Date Fri, 08 May 2015 22:35:02 GMT

     [ https://issues.apache.org/jira/browse/HIVE-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-10661:
------------------------------------
    Description: 
Examples of running same query (Q1) on experimental setup, with Parallel GC, 12 times. 
Time, DAG name, DAG time, GC time counter.
GC time counter on LLAP seems relatively reliable.
Note that non-IO jobs are also much slower during some time. It may not be explained entirely
by GC, I am investigating it now.
Running io and non-io on the same cluster w/o restarting produces these problems also only
on non-IO runs

I may look at this later, after main GC tuning, but for now I decided to give up on this since
elevator will be on by default when using LLAP.


{noformat}
$ cat io-dag.csv 
2015-05-08 12:10:57,695,dag_1429683757595_0843_1,71142,953216
2015-05-08 12:11:41,769,dag_1429683757595_0843_2,43144,844430
2015-05-08 12:12:22,335,dag_1429683757595_0843_3,39828,866538
2015-05-08 12:13:01,327,dag_1429683757595_0843_4,38213,822179
2015-05-08 12:13:39,610,dag_1429683757595_0843_5,37513,863968
2015-05-08 12:14:19,293,dag_1429683757595_0843_6,38320,913591
2015-05-08 12:14:58,500,dag_1429683757595_0843_7,38587,972450
2015-05-08 12:15:39,017,dag_1429683757595_0843_8,39845,1085598
2015-05-08 12:16:19,708,dag_1429683757595_0843_9,39979,1165559
2015-05-08 12:17:03,174,dag_1429683757595_0843_10,42713,1447033
2015-05-08 12:17:47,557,dag_1429683757595_0843_11,43670,1454114
2015-05-08 12:18:31,440,dag_1429683757595_0843_12,43178,1380477

$ cat noio-dag.csv 
2015-05-08 11:44:05,846,dag_1429683757595_0841_1,60740,1643276
2015-05-08 11:44:55,761,dag_1429683757595_0841_2,48984,1590546
2015-05-08 11:45:48,978,dag_1429683757595_0841_3,52353,1765823
2015-05-08 11:46:44,810,dag_1429683757595_0841_4,54930,1831224
2015-05-08 11:47:47,368,dag_1429683757595_0841_5,61677,2068089
2015-05-08 11:49:05,235,dag_1429683757595_0841_6,76725,2416709
2015-05-08 11:51:56,998,dag_1429683757595_0841_7,170575,3250698
2015-05-08 11:58:16,728,dag_1429683757595_0841_8,377732,5541900
2015-05-08 12:03:17,344,dag_1429683757595_0841_9,298682,1844769
2015-05-08 12:05:23,267,dag_1429683757595_0841_10,124954,1331763
2015-05-08 12:06:35,650,dag_1429683757595_0841_11,71350,1703387
2015-05-08 12:07:42,599,dag_1429683757595_0841_12,66143,1724482
{noformat}

  was:
Examples of running same query (Q1) on experimental setup, with Parallel GC, 12 times. 
Time, DAG name, DAG time, GC time counter.
GC time counter on LLAP seems relatively reliable.
Note that non-IO jobs are also much slower during some time. It may not be explained entirely
by GC, I am investigating it now.

I may look at this later, after main GC tuning, but for now I decided to give up on this since
elevator will be on by default when using LLAP.


{noformat}
$ cat io-dag.csv 
2015-05-08 12:10:57,695,dag_1429683757595_0843_1,71142,953216
2015-05-08 12:11:41,769,dag_1429683757595_0843_2,43144,844430
2015-05-08 12:12:22,335,dag_1429683757595_0843_3,39828,866538
2015-05-08 12:13:01,327,dag_1429683757595_0843_4,38213,822179
2015-05-08 12:13:39,610,dag_1429683757595_0843_5,37513,863968
2015-05-08 12:14:19,293,dag_1429683757595_0843_6,38320,913591
2015-05-08 12:14:58,500,dag_1429683757595_0843_7,38587,972450
2015-05-08 12:15:39,017,dag_1429683757595_0843_8,39845,1085598
2015-05-08 12:16:19,708,dag_1429683757595_0843_9,39979,1165559
2015-05-08 12:17:03,174,dag_1429683757595_0843_10,42713,1447033
2015-05-08 12:17:47,557,dag_1429683757595_0843_11,43670,1454114
2015-05-08 12:18:31,440,dag_1429683757595_0843_12,43178,1380477

$ cat noio-dag.csv 
2015-05-08 11:44:05,846,dag_1429683757595_0841_1,60740,1643276
2015-05-08 11:44:55,761,dag_1429683757595_0841_2,48984,1590546
2015-05-08 11:45:48,978,dag_1429683757595_0841_3,52353,1765823
2015-05-08 11:46:44,810,dag_1429683757595_0841_4,54930,1831224
2015-05-08 11:47:47,368,dag_1429683757595_0841_5,61677,2068089
2015-05-08 11:49:05,235,dag_1429683757595_0841_6,76725,2416709
2015-05-08 11:51:56,998,dag_1429683757595_0841_7,170575,3250698
2015-05-08 11:58:16,728,dag_1429683757595_0841_8,377732,5541900
2015-05-08 12:03:17,344,dag_1429683757595_0841_9,298682,1844769
2015-05-08 12:05:23,267,dag_1429683757595_0841_10,124954,1331763
2015-05-08 12:06:35,650,dag_1429683757595_0841_11,71350,1703387
2015-05-08 12:07:42,599,dag_1429683757595_0841_12,66143,1724482
{noformat}


> LLAP: investigate why GC with IO elevator disabled is so bad
> ------------------------------------------------------------
>
>                 Key: HIVE-10661
>                 URL: https://issues.apache.org/jira/browse/HIVE-10661
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Prasanth Jayachandran
>
> Examples of running same query (Q1) on experimental setup, with Parallel GC, 12 times.

> Time, DAG name, DAG time, GC time counter.
> GC time counter on LLAP seems relatively reliable.
> Note that non-IO jobs are also much slower during some time. It may not be explained
entirely by GC, I am investigating it now.
> Running io and non-io on the same cluster w/o restarting produces these problems also
only on non-IO runs
> I may look at this later, after main GC tuning, but for now I decided to give up on this
since elevator will be on by default when using LLAP.
> {noformat}
> $ cat io-dag.csv 
> 2015-05-08 12:10:57,695,dag_1429683757595_0843_1,71142,953216
> 2015-05-08 12:11:41,769,dag_1429683757595_0843_2,43144,844430
> 2015-05-08 12:12:22,335,dag_1429683757595_0843_3,39828,866538
> 2015-05-08 12:13:01,327,dag_1429683757595_0843_4,38213,822179
> 2015-05-08 12:13:39,610,dag_1429683757595_0843_5,37513,863968
> 2015-05-08 12:14:19,293,dag_1429683757595_0843_6,38320,913591
> 2015-05-08 12:14:58,500,dag_1429683757595_0843_7,38587,972450
> 2015-05-08 12:15:39,017,dag_1429683757595_0843_8,39845,1085598
> 2015-05-08 12:16:19,708,dag_1429683757595_0843_9,39979,1165559
> 2015-05-08 12:17:03,174,dag_1429683757595_0843_10,42713,1447033
> 2015-05-08 12:17:47,557,dag_1429683757595_0843_11,43670,1454114
> 2015-05-08 12:18:31,440,dag_1429683757595_0843_12,43178,1380477
> $ cat noio-dag.csv 
> 2015-05-08 11:44:05,846,dag_1429683757595_0841_1,60740,1643276
> 2015-05-08 11:44:55,761,dag_1429683757595_0841_2,48984,1590546
> 2015-05-08 11:45:48,978,dag_1429683757595_0841_3,52353,1765823
> 2015-05-08 11:46:44,810,dag_1429683757595_0841_4,54930,1831224
> 2015-05-08 11:47:47,368,dag_1429683757595_0841_5,61677,2068089
> 2015-05-08 11:49:05,235,dag_1429683757595_0841_6,76725,2416709
> 2015-05-08 11:51:56,998,dag_1429683757595_0841_7,170575,3250698
> 2015-05-08 11:58:16,728,dag_1429683757595_0841_8,377732,5541900
> 2015-05-08 12:03:17,344,dag_1429683757595_0841_9,298682,1844769
> 2015-05-08 12:05:23,267,dag_1429683757595_0841_10,124954,1331763
> 2015-05-08 12:06:35,650,dag_1429683757595_0841_11,71350,1703387
> 2015-05-08 12:07:42,599,dag_1429683757595_0841_12,66143,1724482
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message