cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lohfink (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family
Date Fri, 16 May 2014 11:18:57 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999621#comment-13999621
] 

Chris Lohfink commented on CASSANDRA-7247:
------------------------------------------

Problem is StreamSummary is not thread safe.  There is a ConcurrentStreamSummary, which I
found in this implementation to be ~5x slower then a synchronized block around the offer of
the non-thread safe one.  Concurrent did perform similarly when also wrapped in synchronized
block which I will show below but because it would lose any benefit of being a concurrent
implementation when access is serialized I think the faster impl is best.

Done on 2013 retina MBP with 500gb ssd:

{code:title=No Changes}
            id, ops       ,    op/s,   key/s,    mean,     med,     .95,     .99,    .999,
    max,   time,   stderr
 4 threadCount, 634450    ,   21692,   21692,     0.2,     0.2,     0.2,     0.2,     0.4,
  740.1,   29.2,  0.01188
 8 threadCount, 886600    ,   29762,   29762,     0.3,     0.2,     0.3,     0.4,     1.3,
 1007.3,   29.8,  0.01220
16 threadCount, 912050    ,   29035,   29035,     0.5,     0.3,     0.9,     2.5,    11.2,
 1393.8,   31.4,  0.01162
24 threadCount, 1022250   ,   32681,   32681,     0.7,     0.5,     1.0,     2.9,    13.5,
 1126.5,   31.3,  0.00923
36 threadCount, 946550    ,   30900,   30900,     1.2,     0.8,     1.4,     3.0,    22.5,
 1369.2,   30.6,  0.01089
{code}

{code:title=With Patch}
            id, ops       ,    op/s,   key/s,    mean,     med,     .95,     .99,    .999,
    max,   time,   stderr
 4 threadCount, 643900    ,   21700,   21700,     0.2,     0.2,     0.2,     0.2,     0.9,
  941.1,   29.7,  0.01079
 8 threadCount, 942100    ,   32300,   32300,     0.2,     0.2,     0.3,     0.3,     1.2,
  849.5,   29.2,  0.01519
16 threadCount, 907400    ,   30650,   30650,     0.5,     0.3,     0.8,     1.9,    10.7,
 1124.0,   29.6,  0.01112
24 threadCount, 1026150   ,   31753,   31753,     0.7,     0.5,     0.9,     3.3,    20.6,
 1299.0,   32.3,  0.01295
36 threadCount, 980600    ,   30077,   30077,     1.2,     0.8,     1.3,     2.7,    24.9,
 1394.3,   32.6,  0.01747
{code}

> Provide top ten most frequent keys per column family
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7247
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Lohfink
>            Priority: Minor
>         Attachments: patch.diff
>
>
> Since already have the nice addthis stream library, can use it to keep track of most
frequent DecoratedKeys that come through the system using StreamSummaries ([nice explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
 Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message