aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: Review Request 43457: Increase throughput of DbTaskStore
Date Fri, 12 Feb 2016 01:09:34 GMT


> On Feb. 11, 2016, 12:57 a.m., Bill Farner wrote:
> > It would be nice to hear how this change jives with the opposite change made in
https://reviews.apache.org/r/42882
> 
> Maxim Khutornenko wrote:
>     I thought about that too. I think there are 2 major differences: the total number
of rows generated by the multi-join select statement and number of required subselects. In
that RB, lowering the row count from 500k to just under 100 plus the low number of required
subselects helped to unlock perf gains.
>     
>     In this particular scenario, it appears that the frequency of subselects trumps everything
else.
>     
>     Zameer, what's the overall number of rows returned by the select statement for a
single task in your case?
> 
> Zameer Manji wrote:
>     Running:
>     ````
>         SELECT
>           t.id AS row_id,
>           t.task_config_row_id AS task_config_row_id,
>           t.task_id AS task_id,
>           t.instance_id AS instance_id,
>           t.status AS status,
>           t.failure_count AS failure_count,
>           t.ancestor_task_id AS ancestor_id,
>           j.role AS c_j_role,
>           j.environment AS c_j_environment,
>           j.name AS c_j_name,
>           h.slave_id AS slave_id,
>           h.host AS slave_host,
>           tp.name as tp_name,
>           tp.port as tp_port,
>           te.timestamp_ms as te_timestamp,
>           te.status as te_status,
>           te.message as te_message,
>           te.scheduler_host as te_scheduler
>         FROM tasks AS t
>         INNER JOIN task_configs as c ON c.id = t.task_config_row_id
>         INNER JOIN job_keys AS j ON j.id = c.job_key_id
>         LEFT OUTER JOIN task_ports as tp ON tp.task_row_id = t.id
>         LEFT OUTER JOIN task_events as te ON te.task_row_id = t.id
>         LEFT OUTER JOIN host_attributes AS h ON h.id = t.slave_row_id
>         WHERE task_id = '1454546771388-zmanji-devel-labrat-237-0e52b4a9-a8da-4958-997f-7bbe3db6b5d2'
>     ````
>     
>     On a test cluster returns 4 rows where thhe task is in the RUNNING state.
>     
>     If we consider it, a job typically does not allocate that many ports, and will have
less than 8 events on the task.
>     
>     Further running
>     ````
>         SELECT
>           c.id AS id,
>           c.creator_user AS creator_user,
>           c.service AS is_service,
>           c.num_cpus AS num_cpus,
>           c.ram_mb AS ram_mb,
>           c.disk_mb AS disk_mb,
>           c.priority AS priority,
>           c.max_task_failures AS max_task_failures,
>           c.production AS production,
>           c.contact_email AS contact_email,
>           c.executor_name AS executor_name,
>           c.executor_data AS executor_data,
>           c.tier AS tier,
>           j.role AS j_role,
>           j.environment AS j_environment,
>           j.name AS j_name,
>           p.port_name AS p_port_name,
>           d.id AS c_id,
>           d.image AS c_image,
>           m.id AS m_id,
>           m.key AS m_key,
>           m.value AS m_value,
>           tc.id AS constraint_id,
>           tc.name AS constraint_name,
>           tlc.id AS constraint_l_id,
>           tlc.value AS constraint_l_limit,
>           tvc.id AS constraint_v_id,
>           tvc.negated AS constraint_v_negated,
>           tvcv.value as constraint_v_v_value
>         FROM task_configs AS c
>         INNER JOIN job_keys AS j ON j.id = c.job_key_id
>         LEFT OUTER JOIN task_config_requested_ports AS p ON p.task_config_id = c.id
>         LEFT OUTER JOIN task_config_docker_containers AS d ON d.task_config_id = c.id
>         LEFT OUTER JOIN task_config_metadata AS m ON m.task_config_id = c.id
>         LEFT OUTER JOIN task_constraints AS tc ON tc.task_config_id = c.id
>         LEFT OUTER JOIN limit_constraints as tlc ON tlc.constraint_id = tc.id
>         LEFT OUTER JOIN value_constraints as tvc ON tvc.constraint_id = tc.id
>         LEFT OUTER JOIN value_constraint_values AS tvcv ON tvcv.value_constraint_id =
tvc.id
>         WHERE c.id = 1
>     ````
>     
>     Returns 2 rows for a a task in the above job.
>     
>     I think this is because a tpyical job doesn't have that many constraints.

Thanks Zameer. This confirms my assumptions about row count vs. sub-select chattiness.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43457/#review118790
-----------------------------------------------------------


On Feb. 11, 2016, 8:03 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43457/
> -----------------------------------------------------------
> 
> (Updated Feb. 11, 2016, 8:03 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Profiling master indicated that the bottleneck was MyBatis populating ResultSets and
populating the resulting objects. This patch removes subselects, which reduces the number
of ResultSets and removes the population of an object via a constructor which is slower than
populating an object via setters.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/storage/db/views/DbAssginedPort.java PRE-CREATION

>   src/main/java/org/apache/aurora/scheduler/storage/db/views/DbAssignedTask.java 93722395ed9fcd22dcb12e34e648e6e410952d43

>   src/main/java/org/apache/aurora/scheduler/storage/db/views/DbScheduledTask.java 502a1fa6fc141df498f0f09af292ce24e269731d

>   src/main/resources/org/apache/aurora/scheduler/storage/db/TaskConfigMapper.xml b1394cf44b7ddafcbc47bb1968306d0b33293380

>   src/main/resources/org/apache/aurora/scheduler/storage/db/TaskMapper.xml ea469cce31544221c34ae05a1c65f71271985655

> 
> Diff: https://reviews.apache.org/r/43457/diff/
> 
> 
> Testing
> -------
> 
> Master:
> Benchmark                                      (numTasks)   Mode  Cnt   Score    Error
 Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run       10000  thrpt    5  44.052 ± 14.689
 ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run       50000  thrpt    5   0.179 ±  0.052
 ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run      100000  thrpt    5   0.087 ±  0.022
 ops/s
> 
> This Patch:
> Benchmark                                      (numTasks)   Mode  Cnt   Score   Error
 Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run       10000  thrpt    5  51.531 ± 7.236
 ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run       50000  thrpt    5   7.370 ± 1.320
 ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run      100000  thrpt    5   2.143 ± 1.234
 ops/s
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message