aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mehrdad Nurolahzade (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1847) Eliminate sequential scan in MemTaskStore.getJobKeys()
Date Tue, 06 Dec 2016 23:22:58 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727059#comment-15727059
] 

Mehrdad Nurolahzade commented on AURORA-1847:
---------------------------------------------

Obviously this would no longer be a problem if/when our move to {{DBTaskStore}} is finalized.
We are going to revisit the impediments of such move again (soon). 

In the meantime, this can be a band-aid to improve the performance of loading the scheduler
landing page almost three orders of magnitude for us (results from my quick & dirty fix):
{code}
Benchmark                                       (numTasks)   Mode  Cnt       Score       Error
 Units
TaskStoreBenchmarks.DBFetchTasksBenchmark.run        10000  thrpt    5  239816.089 ± 21423.880
 ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run        50000  thrpt    5  317320.217 ± 27734.522
 ops/s
TaskStoreBenchmarks.DBFetchTasksBenchmark.run       100000  thrpt    5  316582.626 ± 66012.270
 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5  544172.191 ± 46109.756
 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5  344869.887 ± 35948.155
 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5  345617.654 ± 51053.176
 ops/s
{code}

> Eliminate sequential scan in MemTaskStore.getJobKeys()
> ------------------------------------------------------
>
>                 Key: AURORA-1847
>                 URL: https://issues.apache.org/jira/browse/AURORA-1847
>             Project: Aurora
>          Issue Type: Story
>          Components: Efficiency, UI
>            Reporter: Mehrdad Nurolahzade
>            Priority: Minor
>              Labels: newbie
>
> The existing {{TaskStoreBenchmarks}} shows {{DBTaskStore}} is almost two orders of magnitude
faster than {{MemTaskStore}} when it comes to {{getJobKeys()}}:
> {code}
> Benchmark                                       (numTasks)   Mode  Cnt       Score  
    Error  Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run        10000  thrpt    5  320271.082 ±
30842.727  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run        50000  thrpt    5  334805.551 ±
20435.139  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run       100000  thrpt    5  317395.890 ±
45302.180  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       10000  thrpt    5     624.944 ±
   54.038  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run       50000  thrpt    5      91.335 ±
    9.241  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run      100000  thrpt    5      27.712 ±
    8.128  ops/s
> {code}
> If scheduler is configured to run with the {{MemTaskStore}} every hit on scheduler page
({{/scheduler}}) causes a call to {{MemTaskStore.getJobKeys()}}. 
> The implementation of this method is currently very inefficient as it results in a sequential
scan of the task store and then mapping to their respective job keys. The sequential scan
and mapping to job key can be eliminated by simply returning the key set of the existing secondary
index  {{job}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message