cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7761) Upgrade netty
Date Wed, 01 Oct 2014 20:17:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155426#comment-14155426
] 

T Jake Luciani edited comment on CASSANDRA-7761 at 10/1/14 8:17 PM:
--------------------------------------------------------------------

I've done some work to further optimize our netty server and gotten some significant gains.

https://github.com/tjake/cassandra/tree/netty-perf

The primary change is to avoid using a separate thread pool for the dispatch step and re-use
the nio threads.  This cuts a large amount of latency from the request as you can see from
the runs below (4,8,16 threads, later C* becomes the bottleneck).  It also cuts a large amount
of cpu and thread switching.  Also switched on epoll.

Sharing the thread pool also let's netty naturally batch responses, so we no longer need CASSANDRA-5663

I need to do further testing on our cstar clusters but initial results look promising and
the code is cleaner.

{code}
--Current trunk--

./bin/cassandra -f  152.08s user 25.14s system 173% cpu 1:42.06 total

Results:
op rate                   : 45473
partition rate            : 45473
row rate                  : 45473
latency mean              : 20.2
latency median            : 16.9
latency 95th percentile   : 48.4
latency 99th percentile   : 75.2
latency 99.9th percentile : 116.4
latency max               : 352.5
total gc count            : 39
total gc mb               : 12477
total gc time (s)         : 2
avg gc time(ms)           : 43
stdev gc time(ms)         : 10
Total operation time      : 00:00:33
Improvement over 609 threadCount: 3%
             id, total ops , adj row/s,    op/s,    pk/s,   row/s,    mean,     med,     .95,
    .99,    .999,     max,   time,   stderr,  gc: #,  max ms,  sum ms,  sdv ms,      mb
  4 threadCount, 372704    ,        -0,   12150,   12150,   12150,     0.3,     0.3,     0.5,
    0.9,     4.0,    44.7,   30.7,  0.01078,     20,     625,     625,       5,    6352
  8 threadCount, 566480    ,     18307,   18289,   18289,   18289,     0.4,     0.4,     0.7,
    1.3,     5.6,    56.9,   31.0,  0.01124,     23,     781,     781,       6,    7320
 16 threadCount, 771731    ,     24763,   24739,   24739,   24739,     0.6,     0.5,     1.2,
    2.5,    18.3,    66.5,   31.2,  0.01758,     25,     885,     885,       7,    7980
 24 threadCount, 916588    ,     29341,   29312,   29312,   29312,     0.8,     0.6,     1.6,
    3.7,    12.3,    52.5,   31.3,  0.01256,     26,     899,     899,       6,    8308
 36 threadCount, 1039678   ,     33081,   33068,   33068,   33068,     1.1,     0.8,     2.2,
    5.9,    23.5,    59.2,   31.4,  0.00985,     29,     986,     986,       7,    9271
 54 threadCount, 1123610   ,     35823,   35780,   35780,   35780,     1.5,     1.1,     3.2,
    8.9,    36.1,    83.6,   31.4,  0.02015,     30,    1104,    1169,      11,    9581
 81 threadCount, 1185809   ,        -0,   37260,   37260,   37260,     2.2,     1.6,     4.7,
   13.0,    44.0,   300.7,   31.8,  0.01640,     32,    1074,    1169,       8,   10235
121 threadCount, 1275470   ,        -0,   40124,   40124,   40124,     3.0,     2.4,     6.3,
   14.7,    43.7,    71.0,   31.8,  0.01488,     33,    1053,    1152,       7,   10556
181 threadCount, 1326379   ,     41472,   41413,   41413,   41413,     4.4,     3.5,     9.3,
   22.1,    51.4,    84.0,   32.0,  0.01061,     34,    1116,    1277,       7,   10876
271 threadCount, 1340955   ,     41138,   41060,   41060,   41060,     6.6,     4.9,    17.8,
   36.3,    63.2,   125.8,   32.7,  0.01902,     35,    1234,    1375,      11,   11191
406 threadCount, 1418529   ,     43033,   42978,   42978,   42978,     9.4,     7.3,    24.3,
   44.2,    81.3,   299.8,   33.0,  0.01606,     36,    1172,    1432,       9,   11517
609 threadCount, 1465946   ,     44234,   44159,   44159,   44159,    13.8,    11.6,    33.5,
   53.3,    94.5,   135.8,   33.2,  0.01105,     37,    1183,    1428,       9,   11837
913 threadCount, 1544584   ,     45547,   45473,   45473,   45473,    20.2,    16.9,    48.4,
   75.2,   116.4,   352.5,   34.0,  0.00953,     39,    1324,    1663,      10,   12477
END





--Netty fixes--
./bin/cassandra -f  110.27s user 13.83s system 116% cpu 1:46.80 total

Results:
op rate                   : 45506
partition rate            : 45506
row rate                  : 45506
latency mean              : 20.1
latency median            : 17.3
latency 95th percentile   : 40.6
latency 99th percentile   : 69.4
latency 99.9th percentile : 148.4
latency max               : 261.7
total gc count            : 38
total gc mb               : 12154
total gc time (s)         : 2
avg gc time(ms)           : 40
stdev gc time(ms)         : 10
Total operation time      : 00:00:33
Improvement over 609 threadCount: 2%
             id, total ops , adj row/s,    op/s,    pk/s,   row/s,    mean,     med,     .95,
    .99,    .999,     max,   time,   stderr,  gc: #,  max ms,  sum ms,  sdv ms,      mb
  4 threadCount, 549137    ,        -0,   17810,   17810,   17810,     0.2,     0.2,     0.3,
    0.7,     4.0,    50.8,   30.8,  0.01549,     26,     843,     843,       5,    8259
  8 threadCount, 712047    ,     22991,   22988,   22988,   22988,     0.3,     0.3,     0.5,
    1.1,     5.0,    75.6,   31.0,  0.01156,     24,     896,     896,      10,    7643
 16 threadCount, 854794    ,     27261,   27261,   27261,   27261,     0.6,     0.5,     0.8,
    1.6,     6.7,   675.7,   31.4,  0.00871,     25,     865,     865,       6,    7983
 24 threadCount, 937211    ,     30181,   30151,   30151,   30151,     0.8,     0.7,     1.1,
    2.1,     9.9,    63.7,   31.1,  0.00833,     25,     897,     897,       9,    7990
 36 threadCount, 991671    ,     31797,   31791,   31791,   31791,     1.1,     1.0,     1.6,
    2.8,    30.6,    60.6,   31.2,  0.00628,     26,     860,     860,       4,    8313
 54 threadCount, 1004934   ,     32155,   32124,   32124,   32124,     1.7,     1.5,     2.3,
    4.2,    33.4,    82.7,   31.3,  0.01421,     26,    1019,    1019,      13,    8303
 81 threadCount, 1060294   ,        -0,   33734,   33734,   33734,     2.4,     2.2,     3.3,
    5.3,    34.1,    54.7,   31.4,  0.00888,     27,     866,     866,       4,    8636
121 threadCount, 1065139   ,        -0,   33931,   33931,   33931,     3.6,     3.3,     4.9,
    7.6,    38.0,    83.7,   31.4,  0.00646,     27,     954,     954,      10,    8636
181 threadCount, 1058899   ,        -0,   33635,   33635,   33635,     5.4,     4.9,     7.4,
   11.9,    49.0,   626.1,   31.5,  0.01520,     27,     944,     944,       8,    8637
271 threadCount, 1219920   ,        -0,   38376,   38376,   38376,     7.1,     6.1,    11.0,
   25.6,    74.0,   532.4,   31.8,  0.01698,     31,    1198,    1296,      16,    9904
406 threadCount, 1438641   ,     44469,   44426,   44426,   44426,     9.1,     8.0,    14.7,
   39.7,    76.4,   595.8,   32.4,  0.01055,     36,    1151,    1385,      10,   11504
609 threadCount, 1502339   ,     44705,   44705,   44705,   44705,    13.6,    11.6,    27.6,
   62.1,   105.3,   156.2,   33.6,  0.01404,     38,    1325,    1625,      12,   12095
913 threadCount, 1529752   ,        -0,   45506,   45506,   45506,    20.1,    17.3,    40.6,
   69.4,   148.4,   261.7,   33.6,  0.01656,     38,    1221,    1509,      10,   12154
{code}


was (Author: tjake):
I've done some work to further optimize our netty server and gotten some significant gains.
The primary change is to avoid using a separate thread pool for the dispatch step and re-use
the nio threads.  This cuts a large amount of latency from the request as you can see from
the runs below (4,8,16 threads, later C* becomes the bottleneck).  It also cuts a large amount
of cpu and thread switching.  Also switched on epoll.

Sharing the thread pool also let's netty naturally batch responses, so we no longer need CASSANDRA-5663

I need to do further testing on our cstar clusters but initial results look promising and
the code is cleaner.

{code}
--Current trunk--

./bin/cassandra -f  152.08s user 25.14s system 173% cpu 1:42.06 total

Results:
op rate                   : 45473
partition rate            : 45473
row rate                  : 45473
latency mean              : 20.2
latency median            : 16.9
latency 95th percentile   : 48.4
latency 99th percentile   : 75.2
latency 99.9th percentile : 116.4
latency max               : 352.5
total gc count            : 39
total gc mb               : 12477
total gc time (s)         : 2
avg gc time(ms)           : 43
stdev gc time(ms)         : 10
Total operation time      : 00:00:33
Improvement over 609 threadCount: 3%
             id, total ops , adj row/s,    op/s,    pk/s,   row/s,    mean,     med,     .95,
    .99,    .999,     max,   time,   stderr,  gc: #,  max ms,  sum ms,  sdv ms,      mb
  4 threadCount, 372704    ,        -0,   12150,   12150,   12150,     0.3,     0.3,     0.5,
    0.9,     4.0,    44.7,   30.7,  0.01078,     20,     625,     625,       5,    6352
  8 threadCount, 566480    ,     18307,   18289,   18289,   18289,     0.4,     0.4,     0.7,
    1.3,     5.6,    56.9,   31.0,  0.01124,     23,     781,     781,       6,    7320
 16 threadCount, 771731    ,     24763,   24739,   24739,   24739,     0.6,     0.5,     1.2,
    2.5,    18.3,    66.5,   31.2,  0.01758,     25,     885,     885,       7,    7980
 24 threadCount, 916588    ,     29341,   29312,   29312,   29312,     0.8,     0.6,     1.6,
    3.7,    12.3,    52.5,   31.3,  0.01256,     26,     899,     899,       6,    8308
 36 threadCount, 1039678   ,     33081,   33068,   33068,   33068,     1.1,     0.8,     2.2,
    5.9,    23.5,    59.2,   31.4,  0.00985,     29,     986,     986,       7,    9271
 54 threadCount, 1123610   ,     35823,   35780,   35780,   35780,     1.5,     1.1,     3.2,
    8.9,    36.1,    83.6,   31.4,  0.02015,     30,    1104,    1169,      11,    9581
 81 threadCount, 1185809   ,        -0,   37260,   37260,   37260,     2.2,     1.6,     4.7,
   13.0,    44.0,   300.7,   31.8,  0.01640,     32,    1074,    1169,       8,   10235
121 threadCount, 1275470   ,        -0,   40124,   40124,   40124,     3.0,     2.4,     6.3,
   14.7,    43.7,    71.0,   31.8,  0.01488,     33,    1053,    1152,       7,   10556
181 threadCount, 1326379   ,     41472,   41413,   41413,   41413,     4.4,     3.5,     9.3,
   22.1,    51.4,    84.0,   32.0,  0.01061,     34,    1116,    1277,       7,   10876
271 threadCount, 1340955   ,     41138,   41060,   41060,   41060,     6.6,     4.9,    17.8,
   36.3,    63.2,   125.8,   32.7,  0.01902,     35,    1234,    1375,      11,   11191
406 threadCount, 1418529   ,     43033,   42978,   42978,   42978,     9.4,     7.3,    24.3,
   44.2,    81.3,   299.8,   33.0,  0.01606,     36,    1172,    1432,       9,   11517
609 threadCount, 1465946   ,     44234,   44159,   44159,   44159,    13.8,    11.6,    33.5,
   53.3,    94.5,   135.8,   33.2,  0.01105,     37,    1183,    1428,       9,   11837
913 threadCount, 1544584   ,     45547,   45473,   45473,   45473,    20.2,    16.9,    48.4,
   75.2,   116.4,   352.5,   34.0,  0.00953,     39,    1324,    1663,      10,   12477
END





--Netty fixes--
./bin/cassandra -f  110.27s user 13.83s system 116% cpu 1:46.80 total

Results:
op rate                   : 45506
partition rate            : 45506
row rate                  : 45506
latency mean              : 20.1
latency median            : 17.3
latency 95th percentile   : 40.6
latency 99th percentile   : 69.4
latency 99.9th percentile : 148.4
latency max               : 261.7
total gc count            : 38
total gc mb               : 12154
total gc time (s)         : 2
avg gc time(ms)           : 40
stdev gc time(ms)         : 10
Total operation time      : 00:00:33
Improvement over 609 threadCount: 2%
             id, total ops , adj row/s,    op/s,    pk/s,   row/s,    mean,     med,     .95,
    .99,    .999,     max,   time,   stderr,  gc: #,  max ms,  sum ms,  sdv ms,      mb
  4 threadCount, 549137    ,        -0,   17810,   17810,   17810,     0.2,     0.2,     0.3,
    0.7,     4.0,    50.8,   30.8,  0.01549,     26,     843,     843,       5,    8259
  8 threadCount, 712047    ,     22991,   22988,   22988,   22988,     0.3,     0.3,     0.5,
    1.1,     5.0,    75.6,   31.0,  0.01156,     24,     896,     896,      10,    7643
 16 threadCount, 854794    ,     27261,   27261,   27261,   27261,     0.6,     0.5,     0.8,
    1.6,     6.7,   675.7,   31.4,  0.00871,     25,     865,     865,       6,    7983
 24 threadCount, 937211    ,     30181,   30151,   30151,   30151,     0.8,     0.7,     1.1,
    2.1,     9.9,    63.7,   31.1,  0.00833,     25,     897,     897,       9,    7990
 36 threadCount, 991671    ,     31797,   31791,   31791,   31791,     1.1,     1.0,     1.6,
    2.8,    30.6,    60.6,   31.2,  0.00628,     26,     860,     860,       4,    8313
 54 threadCount, 1004934   ,     32155,   32124,   32124,   32124,     1.7,     1.5,     2.3,
    4.2,    33.4,    82.7,   31.3,  0.01421,     26,    1019,    1019,      13,    8303
 81 threadCount, 1060294   ,        -0,   33734,   33734,   33734,     2.4,     2.2,     3.3,
    5.3,    34.1,    54.7,   31.4,  0.00888,     27,     866,     866,       4,    8636
121 threadCount, 1065139   ,        -0,   33931,   33931,   33931,     3.6,     3.3,     4.9,
    7.6,    38.0,    83.7,   31.4,  0.00646,     27,     954,     954,      10,    8636
181 threadCount, 1058899   ,        -0,   33635,   33635,   33635,     5.4,     4.9,     7.4,
   11.9,    49.0,   626.1,   31.5,  0.01520,     27,     944,     944,       8,    8637
271 threadCount, 1219920   ,        -0,   38376,   38376,   38376,     7.1,     6.1,    11.0,
   25.6,    74.0,   532.4,   31.8,  0.01698,     31,    1198,    1296,      16,    9904
406 threadCount, 1438641   ,     44469,   44426,   44426,   44426,     9.1,     8.0,    14.7,
   39.7,    76.4,   595.8,   32.4,  0.01055,     36,    1151,    1385,      10,   11504
609 threadCount, 1502339   ,     44705,   44705,   44705,   44705,    13.6,    11.6,    27.6,
   62.1,   105.3,   156.2,   33.6,  0.01404,     38,    1325,    1625,      12,   12095
913 threadCount, 1529752   ,        -0,   45506,   45506,   45506,    20.1,    17.3,    40.6,
   69.4,   148.4,   261.7,   33.6,  0.01656,     38,    1221,    1509,      10,   12154
{code}

> Upgrade netty
> -------------
>
>                 Key: CASSANDRA-7761
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>            Priority: Minor
>             Fix For: 2.1.1
>
>
> Latest netty contains the proper fix for CASSANDRA-7695 plus some of the performance
patches  [~benedict] contributed.  We should upgrade to this following extensive burn in testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message