aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zameer Manji <zma...@apache.org>
Subject Re: Review Request 52669: Move the H2 database off heap.
Date Thu, 09 Feb 2017 19:09:25 GMT


> On Feb. 9, 2017, 9:11 a.m., David McLaughlin wrote:
> > Early scale testing of this has been promising in terms of relieving GC pressure
(although not a silver bullet by any means), but query latency was orders of magnitude slower.
At certain task volumes (500k total tasks in the index) moving H2 off heap was the only way
I was able to successfully complete a snapshot without having ~40s GC pauses that led to ZK
session timeouts and scheduler failovers. I'd propose updating this patch so off-heap can
be turned on and off with a flag, as the hit to read performance may be unacceptable for read
heavy workloads with smaller clusters.

Should this be something we enable by default?


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52669/#review164942
-----------------------------------------------------------


On Oct. 11, 2016, 11:17 a.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52669/
> -----------------------------------------------------------
> 
> (Updated Oct. 11, 2016, 11:17 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, John Sirois, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This experiment is inspired by David's comment: "I don’t think the
> storage engine matters. We just need to be able to offload it from
> the Scheduler JVM. The problem with H2 isn’t SQL or anything else,
> it’s the GC pressure."
> 
> Basic idea is to switch to another storage backend: "nioMemFS stores
> data outside of the VM's heap - useful for large memory DBs without
> incurring GC costs" (http://www.h2database.com/html/advanced.html)
> 
> Our micro-benchmarks look promising
> 
> Current Master (on-heap db with latest versions):
> 
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run                         N/A          N/A
        N/A       10000  thrpt    5  72851.249 ± 15794.210  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run                         N/A          N/A
        N/A       50000  thrpt    5  31626.929 ± 17326.988  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run                         N/A          N/A
        N/A      100000  thrpt    5      0.078 ±     0.013  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                        N/A          N/A
        N/A       10000  thrpt    5    414.135 ±   315.838  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                        N/A          N/A
        N/A       50000  thrpt    5     68.643 ±    24.303  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                        N/A          N/A
        N/A      100000  thrpt    5     32.032 ±    13.870  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run                         N/A         1000
        N/A         N/A  thrpt    5    143.981 ±    78.985  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run                         N/A         5000
        N/A         N/A  thrpt    5     35.224 ±    25.593  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run                         N/A        10000
        N/A         N/A  thrpt    5     18.869 ±     3.318  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run                      1          N/A
        N/A         N/A  thrpt    5     36.013 ±    19.743  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run                     10          N/A
        N/A         N/A  thrpt    5     33.813 ±    11.216  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run                    100          N/A
        N/A         N/A  thrpt    5     20.516 ±    10.526  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run                   1000          N/A
        N/A         N/A  thrpt    5     16.564 ±     2.993  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run                  N/A          N/A
         10         N/A  thrpt    5     32.399 ±    21.310  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run                  N/A          N/A
        100         N/A  thrpt    5     35.518 ±     7.468  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run                  N/A          N/A
       1000         N/A  thrpt    5     19.757 ±    10.035  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run                  N/A          N/A
      10000         N/A  thrpt    5     10.849 ±    10.660  ops/s
> 
> This patch (off-heap db):
> 
> Benchmark                                             (instanceOverrides)  (instances)
 (metadata)  (numTasks)   Mode  Cnt      Score       Error  Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run                         N/A          N/A
        N/A       10000  thrpt    5  77746.436 ± 47191.240  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run                         N/A          N/A
        N/A       50000  thrpt    5  70099.087 ± 37223.642  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run                         N/A          N/A
        N/A      100000  thrpt    5  30461.428 ± 22964.261  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                        N/A          N/A
        N/A       10000  thrpt    5    335.302 ±   229.328  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                        N/A          N/A
        N/A       50000  thrpt    5     61.443 ±    24.280  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                        N/A          N/A
        N/A      100000  thrpt    5     32.129 ±    13.067  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run                         N/A         1000
        N/A         N/A  thrpt    5    411.866 ±   396.308  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run                         N/A         5000
        N/A         N/A  thrpt    5     92.208 ±    65.238  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run                         N/A        10000
        N/A         N/A  thrpt    5     49.870 ±    44.519  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run                      1          N/A
        N/A         N/A  thrpt    5     90.074 ±    31.360  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run                     10          N/A
        N/A         N/A  thrpt    5    114.224 ±    48.576  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run                    100          N/A
        N/A         N/A  thrpt    5    108.767 ±    25.154  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run                   1000          N/A
        N/A         N/A  thrpt    5     45.995 ±    14.795  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run                  N/A          N/A
         10         N/A  thrpt    5     72.343 ±    77.696  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run                  N/A          N/A
        100         N/A  thrpt    5     84.984 ±    27.971  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run                  N/A          N/A
       1000         N/A  thrpt    5     89.819 ±    44.923  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run                  N/A          N/A
      10000         N/A  thrpt    5     31.670 ±    15.236  ops/s
> 
> 
> Please be aware: The values seen here will not really carry over to
> real world usage. It would therefore be awesome if one of you could
> test this on a larger cluster!
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DbModule.java e7287cec28e7b8ca978c506bfe821f261bc0ac26

> 
> Diff: https://reviews.apache.org/r/52669/diff/
> 
> 
> Testing
> -------
> 
> `./gradlew build`
> `./gradlew jmh -Pbenchmarks='UpdateStoreBenchmarks.*|TaskStoreBenchmarks.*'`
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message