cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7217) Native transport performance (with cassandra-stress) drops precipitously past around 1000 threads
Date Thu, 12 Nov 2015 23:24:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003017#comment-15003017
] 

Ariel Weisberg edited comment on CASSANDRA-7217 at 11/12/15 11:24 PM:
----------------------------------------------------------------------

Performance counters

2000 threads
{code}
Results:
op rate                   : 20576 [WRITE:20576]
partition rate            : 20576 [WRITE:20576]
row rate                  : 20576 [WRITE:20576]
latency mean              : 97.2 [WRITE:97.2]
latency median            : 91.0 [WRITE:91.0]
latency 95th percentile   : 179.1 [WRITE:179.1]
latency 99th percentile   : 268.3 [WRITE:268.3]
latency 99.9th percentile : 499.0 [WRITE:499.0]
latency max               : 1123.2 [WRITE:1123.2]
Total partitions          : 19000000 [WRITE:19000000]
Total errors              : 0 [WRITE:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:15:23
END

 Performance counter stats for './cassandra-stress write n=19000000 -rate threads=2000 -mode
native cql3 -node 192.168.1.9':

 3,236,123,141,155      cycles                    #    2.115 GHz                     [16.14%]
 2,580,132,815,701      instructions              #    0.80  insns per cycle        
                                                  #    0.89  stalled cycles per insn [21.45%]
    63,994,020,523      cache-references          #   41.828 M/sec                   [26.72%]
    12,523,946,172      cache-misses              #   19.570 % of all cache refs     [32.00%]
 2,294,356,584,027      idle-cycles-frontend      #   70.90% frontend cycles idle    [37.28%]
 1,636,932,476,246      idle-cycles-backend       #   50.58% backend  cycles idle    [42.54%]
    1529337.521837      cpu-clock (msec)                                            
    1529938.883184      task-clock (msec)         #    1.635 CPUs utilized          
           129,217      page-faults               #    0.084 K/sec                  
        87,687,956      cs                        #    0.057 M/sec                  
        36,591,482      migrations                #    0.024 M/sec                  
           129,132      minor-faults              #    0.084 K/sec                  
   360,467,544,173      branch-instructions       #  235.609 M/sec                   [47.81%]
     5,205,849,494      branch-misses             #    1.44% of all branches         [47.76%]
    67,636,847,959      L1-dcache-load-misses     #   44.209 M/sec                   [47.83%]
    24,113,350,939      L1-dcache-store-misses    #   15.761 M/sec                   [47.94%]
    18,928,905,359      L1-dcache-prefetch-misses #   12.372 M/sec                   [42.84%]
    56,721,903,854      L1-icache-load-misses     #   37.075 M/sec                   [42.94%]
     3,977,754,938      dTLB-load-misses          #    2.600 M/sec                   [42.96%]
       748,817,996      dTLB-store-misses         #    0.489 M/sec                   [42.93%]
       791,352,271      iTLB-load-misses          #    0.517 M/sec                   [42.86%]
     5,414,521,445      branch-load-misses        #    3.539 M/sec                   [42.80%]
    37,275,666,810      LLC-loads                 #   24.364 M/sec                   [42.83%]
    10,226,436,059      LLC-stores                #    6.684 M/sec                   [42.80%]
    16,548,689,552      LLC-prefetches            #   10.817 M/sec                   [10.57%]

     935.835191719 seconds time elapsed
{code}
500 threads
{code}
Results:
op rate                   : 63563 [WRITE:63563]
partition rate            : 63563 [WRITE:63563]
row rate                  : 63563 [WRITE:63563]
latency mean              : 7.9 [WRITE:7.9]
latency median            : 5.8 [WRITE:5.8]
latency 95th percentile   : 16.2 [WRITE:16.2]
latency 99th percentile   : 36.3 [WRITE:36.3]
latency 99.9th percentile : 74.0 [WRITE:74.0]
latency max               : 422.0 [WRITE:422.0]
Total partitions          : 19000000 [WRITE:19000000]
Total errors              : 0 [WRITE:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:04:58
END

 Performance counter stats for './cassandra-stress write n=19000000 -rate threads=500 -mode
native cql3 -node 192.168.1.9':

 1,967,800,644,333      cycles                    #    2.424 GHz                     [16.23%]
 1,939,192,725,937      instructions              #    0.99  insns per cycle        
                                                  #    0.67  stalled cycles per insn [21.56%]
    29,961,702,909      cache-references          #   36.915 M/sec                   [26.87%]
     7,138,097,546      cache-misses              #   23.824 % of all cache refs     [32.16%]
 1,290,923,581,701      idle-cycles-frontend      #   65.60% frontend cycles idle    [37.44%]
   827,710,334,443      idle-cycles-backend       #   42.06% backend  cycles idle    [42.67%]
     811637.475308      cpu-clock (msec)                                            
     811646.201981      task-clock (msec)         #    2.618 CPUs utilized          
            79,867      page-faults               #    0.098 K/sec                  
        34,954,827      cs                        #    0.043 M/sec                  
         1,803,328      migrations                #    0.002 M/sec                  
            79,531      minor-faults              #    0.098 K/sec                  
   216,302,396,604      branch-instructions       #  266.498 M/sec                   [47.89%]
     2,293,191,606      branch-misses             #    1.06% of all branches         [47.75%]
    36,684,160,264      L1-dcache-load-misses     #   45.197 M/sec                   [47.69%]
    15,585,249,129      L1-dcache-store-misses    #   19.202 M/sec                   [47.62%]
    14,137,121,831      L1-dcache-prefetch-misses #   17.418 M/sec                   [42.28%]
    33,608,185,424      L1-icache-load-misses     #   41.407 M/sec                   [42.28%]
     2,489,611,820      dTLB-load-misses          #    3.067 M/sec                   [42.26%]
       371,870,411      dTLB-store-misses         #    0.458 M/sec                   [42.27%]
       512,108,974      iTLB-load-misses          #    0.631 M/sec                   [42.28%]
     2,280,308,348      branch-load-misses        #    2.809 M/sec                   [42.31%]
    16,344,737,798      LLC-loads                 #   20.138 M/sec                   [42.38%]
     3,477,812,875      LLC-stores                #    4.285 M/sec                   [42.43%]
     9,526,173,996      LLC-prefetches            #   11.737 M/sec                   [10.69%]

     310.036724914 seconds time elapsed
{code}


was (Author: aweisberg):
Performance counters

2000 threads
{code}
Results:
op rate                   : 19419 [WRITE:19419]
partition rate            : 19419 [WRITE:19419]
row rate                  : 19419 [WRITE:19419]
latency mean              : 103.0 [WRITE:103.0]
latency median            : 91.3 [WRITE:91.3]
latency 95th percentile   : 179.4 [WRITE:179.4]
latency 99th percentile   : 252.3 [WRITE:252.3]
latency 99.9th percentile : 428.5 [WRITE:428.5]
latency max               : 57651.8 [WRITE:57651.8]
Total partitions          : 19000000 [WRITE:19000000]
Total errors              : 0 [WRITE:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:16:18
END

 Performance counter stats for './cassandra-stress write n=19000000 -rate threads=2000 -mode
native cql3 -node 192.168.1.9':

 3,320,451,421,007      cycles                    #    2.192 GHz                     [15.41%]
 2,563,758,232,484      instructions              #    0.77  insns per cycle        
                                                  #    0.94  stalled cycles per insn [20.47%]
    69,188,067,241      cache-references          #   45.664 M/sec                   [25.56%]
    13,456,198,724      cache-misses              #   19.449 % of all cache refs     [30.60%]
   131,776,347,830      bus-cycles                #   86.973 M/sec                   [35.65%]
 2,415,412,133,089      idle-cycles-frontend      #   72.74% frontend cycles idle    [40.69%]
 1,750,197,198,741      idle-cycles-backend       #   52.71% backend  cycles idle    [45.75%]
    1514363.238593      cpu-clock (msec)                                            
    1515146.390785      task-clock (msec)         #    1.530 CPUs utilized          
           154,815      page-faults               #    0.102 K/sec                  
        87,357,050      cs                        #    0.058 M/sec                  
        37,030,093      migrations                #    0.024 M/sec                  
           154,691      minor-faults              #    0.102 K/sec                  
                 0      major-faults              #    0.000 K/sec                  
                 0      alignment-faults          #    0.000 K/sec                  
                 0      emulation-faults          #    0.000 K/sec                  
   358,579,878,595      branch-instructions       #  236.664 M/sec                   [45.74%]
     5,088,330,722      branch-misses             #    1.42% of all branches         [45.80%]
    70,350,080,393      L1-dcache-load-misses     #   46.431 M/sec                   [45.92%]
    24,626,765,787      L1-dcache-store-misses    #   16.254 M/sec                   [40.88%]
    19,812,757,638      L1-dcache-prefetch-misses #   13.076 M/sec                   [40.97%]
    59,285,911,291      L1-icache-load-misses     #   39.129 M/sec                   [40.92%]
     4,437,071,985      dTLB-load-misses          #    2.928 M/sec                   [40.90%]
       821,151,709      dTLB-store-misses         #    0.542 M/sec                   [40.80%]
     1,188,402,914      iTLB-load-misses          #    0.784 M/sec                   [40.66%]
     5,274,857,779      branch-load-misses        #    3.481 M/sec                   [40.58%]
    39,293,189,238      LLC-loads                 #   25.934 M/sec                   [40.47%]
    10,625,403,856      LLC-stores                #    7.013 M/sec                   [40.45%]
    16,978,686,645      LLC-prefetches            #   11.206 M/sec                   [10.08%]

     990.019887601 seconds time elapsed
{code}
500 threads
{code}
Results:
op rate                   : 63678 [WRITE:63678]
partition rate            : 63678 [WRITE:63678]
row rate                  : 63678 [WRITE:63678]
latency mean              : 7.8 [WRITE:7.8]
latency median            : 5.6 [WRITE:5.6]
latency 95th percentile   : 16.8 [WRITE:16.8]
latency 99th percentile   : 36.5 [WRITE:36.5]
latency 99.9th percentile : 77.5 [WRITE:77.5]
latency max               : 358.8 [WRITE:358.8]
Total partitions          : 19000000 [WRITE:19000000]
Total errors              : 0 [WRITE:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:04:58
END

 Performance counter stats for './cassandra-stress write n=19000000 -rate threads=500 -mode
native cql3 -node 192.168.1.9':

 2,055,138,822,781      cycles                    #    2.519 GHz                     [15.25%]
 1,923,953,212,761      instructions              #    0.94  insns per cycle        
                                                  #    0.71  stalled cycles per insn [20.30%]
    31,745,552,527      cache-references          #   38.904 M/sec                   [25.33%]
     6,931,345,766      cache-misses              #   21.834 % of all cache refs     [30.35%]
    79,818,924,716      bus-cycles                #   97.818 M/sec                   [35.35%]
 1,374,763,901,585      idle-cycles-frontend      #   66.89% frontend cycles idle    [40.37%]
   891,429,827,525      idle-cycles-backend       #   43.38% backend  cycles idle    [45.35%]
     815994.442406      cpu-clock (msec)                                            
     815998.411396      task-clock (msec)         #    2.635 CPUs utilized          
            84,202      page-faults               #    0.103 K/sec                  
        34,375,605      cs                        #    0.042 M/sec                  
         1,661,307      migrations                #    0.002 M/sec                  
            83,803      minor-faults              #    0.103 K/sec                  
                 0      major-faults              #    0.000 K/sec                  
                 0      alignment-faults          #    0.000 K/sec                  
                 0      emulation-faults          #    0.000 K/sec                  
   219,082,315,466      branch-instructions       #  268.484 M/sec                   [45.30%]
     2,321,109,537      branch-misses             #    1.06% of all branches         [45.35%]
    37,321,647,256      L1-dcache-load-misses     #   45.737 M/sec                   [45.40%]
    15,702,399,931      L1-dcache-store-misses    #   19.243 M/sec                   [40.39%]
    14,082,194,661      L1-dcache-prefetch-misses #   17.258 M/sec                   [40.47%]
    35,512,444,743      L1-icache-load-misses     #   43.520 M/sec                   [40.47%]
     2,048,574,473      dTLB-load-misses          #    2.511 M/sec                   [40.46%]
       338,040,710      dTLB-store-misses         #    0.414 M/sec                   [40.47%]
       680,218,846      iTLB-load-misses          #    0.834 M/sec                   [40.47%]
     2,316,842,085      branch-load-misses        #    2.839 M/sec                   [40.44%]
    16,883,500,935      LLC-loads                 #   20.691 M/sec                   [40.41%]
     3,542,330,824      LLC-stores                #    4.341 M/sec                   [40.37%]
     9,938,493,897      LLC-prefetches            #   12.180 M/sec                   [10.04%]

     309.643226007 seconds time elapsed
{code}

> Native transport performance (with cassandra-stress) drops precipitously past around
1000 threads
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7217
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7217
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>              Labels: performance, stress, triaged
>             Fix For: 3.1
>
>
> This is obviously bad. Let's figure out why it's happening and put a stop to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message