cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13444) Fast and garbage-free Streaming Histogram
Date Mon, 22 May 2017 22:04:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020268#comment-16020268
] 

Jason Brown commented on CASSANDRA-13444:
-----------------------------------------

Got distracted last week, but back on this today.

So, one of the problems with the current patch is that, while you've optimized the creating
the histogram, there's a some memory waste when opening up the histogram for reading. Both
{{Spool}} and {{DistanceHolder}} always allocate their backing arrays, but they are only really
used when creating the histogram.

Thus, I've knocked up a version of your patch which splits out the creation of the {{StreamingHistogram}}
from the use of it. To that end, I've created {{StreamingHistogram.Builder}} to be responsbile
for contructing and taking updates to the histogram, and maintains instances of {{Spool}}
and {{DistanceHolder}}. When creation is complete or when you want to grab a snapshot, {{builder#build()}}
will give you a new copy of the {{StreamingHistogram}}. The reason why you may need a snapshot
is due to the sstable early open functionality, which is slightly non-obvious and we've had
tons of bugs with that in the past (hence why I decided to code this up). Let me know what
you think.

[fast-streaming-hist_builder|https://github.com/jasobrown/cassandra/commit/6de9484982b7e3ddcd54118393861598b537d6b4]
branch with the builder.

One further optimization I discovered that can be made is with the {{StreamingHistogramSerializer}}.
On both the ser/de paths, we eagerly reify an entire {{Map}}, and use {{Double}} as the key
(and use a double in the serialization!). With your patch, we can probably make this a heck
of a lot simpler and just dump the {{long[]}} to the file. This shouldn't be too difficult,
but requires changing the {{StreamingHistogramSerializer}} to implement {{IVersionedSerializer}}
to support backward compatibility. We can probably do this work in a follow up ticket.

> Fast and garbage-free Streaming Histogram
> -----------------------------------------
>
>                 Key: CASSANDRA-13444
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13444
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Fuud
>            Assignee: Fuud
>             Fix For: 4.x
>
>         Attachments: results.csv, results.xlsx
>
>
> StreamingHistogram is cause of high cpu usage and GC pressure.
> It was improved at CASSANDRA-13038 by introducing intermediate buffer to try accumulate
different values into the big map before merging them into smaller one.
> But there was not enought for TTL's distributed within large time. Rounding (also introduced
at 13038) can help but it reduce histogram precision specially in case where TTL's does not
distributed uniformly.
> There are several improvements that can help to reduce cpu and gc usage. Them all included
in the pool-request as separate revisions thus you can test them independently.
> Improvements list:
> # Use Map.computeIfAbsent instead of get->checkIfNull->put chain. In this way "add-or-accumulate"
operation takes one map operation instead of two. But this method (default-defined in interface
Map) is overriden in HashMap but not in TreeMap. Thus I changed spool type to HashMap.
> # As we round incoming values to _roundSeconds_ we can also round value on merge. It
will enlarge hit rate for bin operations.
> # Because we inserted only integers into Histogram and rounding values to integers we
can use *int* type everywhere.
> # Histogram takes huge amount of time merging values. In merge method largest amount
of time taken by finding nearest points. It can be eliminated by holding additional TreeSet
with differences, sorted from smalest to greatest.
> # Because we know max size of _bin_ and _differences_ maps we can replace them with sorted
arrays. Search can be done with _Arrays.binarySearch_ and insertion/deletions can be done
by _System.arraycopy_. Also it helps to merge some operations into one.
> # Because spool map is also limited we can replace it with open address primitive map.
It's finaly reduce allocation rate to zero.
> You can see gain given by each step in the attached file. First number is time for one
benchmark invocation and second - is allocation rate in Mb per operation.
> Dependends of payload time is reduced up to 90%.
> Overall gain:
> |.|.|Payload/SpoolSize|.|.|.|% from original
> |.|.|.|original|.|optimized|
> |.|.|secondInMonth/0|.|.|.|
> |time ms/op|.|.|10747,684|.|5545,063|51,6
> |allocation Mb/op|.|.|2441,38858|.|0,002105713|0
> |.|.|.|.|.|.|
> |.|.|secondInMonth/1000|.|.|.|
> |time ms/op|.|.|8988,578|.|5791,179|64,4
> |allocation Mb/op|.|.|2440,951141|.|0,017715454|0
> |.|.|.|.|.|.|
> |.|.|secondInMonth/10000|.|.|.|
> |time ms/op|.|.|10711,671|.|5765,243|53,8
> |allocation Mb/op|.|.|2437,022537|.|0,264083862|0
> |.|.|.|.|.|.|
> |.|.|secondInMonth/100000|.|.|.|
> |time ms/op|.|.|13001,841|.|5638,069|43,4
> |allocation Mb/op|.|.|2396,947113|.|2,003662109|0,1
> |.|.|.|.|.|.|
> |.|.|secondInDay/0|.|.|.|
> |time ms/op|.|.|10381,833|.|5497,804|53
> |allocation Mb/op|.|.|2441,166107|.|0,002105713|0
> |.|.|.|.|.|.|
> |.|.|secondInDay/1000|.|.|.|
> |time ms/op|.|.|8522,157|.|5929,871|69,6
> |allocation Mb/op|.|.|1973,112381|.|0,017715454|0
> |.|.|.|.|.|.|
> |.|.|secondInDay/10000|.|.|.|
> |time ms/op|.|.|10234,978|.|5480,077|53,5
> |allocation Mb/op|.|.|2306,057404|.|0,262969971|0
> |.|.|.|.|.|.|
> |.|.|secondInDay/100000|.|.|.|
> |time ms/op|.|.|2971,178|.|139,079|4,7
> |allocation Mb/op|.|.|172,1276245|.|2,001721191|1,2
> |.|.|.|.|.|.|
> |.|.|secondIn3Hour/0|.|.|.|
> |time ms/op|.|.|10663,123|.|5605,672|52,6
> |allocation Mb/op|.|.|2439,456818|.|0,002105713|0
> |.|.|.|.|.|.|
> |.|.|secondIn3Hour/1000|.|.|.|
> |time ms/op|.|.|9029,788|.|5838,618|64,7
> |allocation Mb/op|.|.|2331,839249|.|0,180664063|0
> |.|.|.|.|.|.|
> |.|.|secondIn3Hour/10000|.|.|.|
> |time ms/op|.|.|4862,409|.|89,001|1,8
> |allocation Mb/op|.|.|965,4871887|.|0,251711652|0
> |.|.|.|.|.|.|
> |.|.|secondIn3Hour/100000|.|.|.|
> |time ms/op|.|.|1484,454|.|95,044|6,4
> |allocation Mb/op|.|.|153,2464722|.|2,001712809|1,3
> |.|.|.|.|.|.|
> |.|.|secondInMin/0|.|.|.|
> |time ms/op|.|.|875,118|.|424,11|48,5
> |allocation Mb/op|.|.|610,3554993|.|0,001776123|0
> |.|.|.|.|.|.|
> |.|.|secondInMin/1000|.|.|.|
> |time ms/op|.|.|568,7|.|84,208|14,8
> |allocation Mb/op|.|.|0,007598114|.|0,01810023|238,2
> |.|.|.|.|.|.|
> |.|.|secondInMin/10000|.|.|.|
> |time ms/op|.|.|573,595|.|83,862|14,6
> |allocation Mb/op|.|.|0,007597351|.|0,252473872|3323,2
> |.|.|.|.|.|.|
> |.|.|secondInMin/100000|.|.|.|
> |time ms/op|.|.|584,457|.|86,554|14,8
> |allocation Mb/op|.|.|0,007595825|.|2,002506106|26363,2
> You may notice increased allocation rate for secondInMin payload. It is because test
use small values and Integer.valueOf use cache for them. In real case where incoming values
will be timestamps this cache will not work. Also constant memory 2 Mb per StreamingHistogram
is quite good.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message