sentry-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kalyan kumar kalvagadda (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SENTRY-2305) Optimize time taken for persistence HMS snapshot
Date Wed, 11 Jul 2018 13:50:00 GMT

    [ https://issues.apache.org/jira/browse/SENTRY-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540113#comment-16540113
] 

kalyan kumar kalvagadda edited comment on SENTRY-2305 at 7/11/18 1:49 PM:
--------------------------------------------------------------------------

*Test Data:*
*Databases:* 2
*Tables in each Database*: 100
*Partitions in table*: 500 

*Note: *Time taken in my tests should be not considered as standard as i'm running them on
my local machine. What we should be looking at is the relative difference.

Work in progress.
||Option||Description||Time Taken||
|1|no change| n sec||
|2|no change| n sec||
|3|no change| n sec||
|4|no change| n sec||

 


was (Author: kkalyan):
Test Data:
Databases: 2
Tables in each Database: 100
Partitions in table: 500 

Note: Time taken in my tests should be not considered as standard as i'm running them on my
local machine. What we should be looking at is the relative difference.

||Option||Description||Time Taken||
|1|no change| n sec||
|2|no change| n sec||
|3|no change| n sec||
|4|no change| n sec||

 

> Optimize time taken for persistence HMS snapshot 
> -------------------------------------------------
>
>                 Key: SENTRY-2305
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2305
>             Project: Sentry
>          Issue Type: Sub-task
>          Components: Sentry
>    Affects Versions: 2.1.0
>            Reporter: kalyan kumar kalvagadda
>            Assignee: kalyan kumar kalvagadda
>            Priority: Major
>
> There are couple of options
> # Break the total snapshot into to batches and persist all of them in parallel in different
transactions. As sentry uses repeatable_read isolation level we should be able to have parallel
writes on the same table. This bring an issue if there is a failure in persisting any of the
batches. This approach needs additional logic of cleaning the partially persisted snapshot.
I’m evaluating this option. 
> ** *Result:* Initial results are promising. Time to persist the snapshot came down by
60%.
> # Try disabling L1 Cache for persisting the snapshot.
> # Try persisting the snapshot entries sequentially in separate transactions. As transactions
which commit huge data might take longer as they take a lot of CPU cycles to keep the rollback
log up to date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message