sentry-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kalyan kumar kalvagadda (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SENTRY-2249) Persist HMS Full Snapshot in batches.
Date Wed, 27 Jun 2018 22:15:00 GMT

    [ https://issues.apache.org/jira/browse/SENTRY-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525670#comment-16525670
] 

kalyan kumar kalvagadda edited comment on SENTRY-2249 at 6/27/18 10:14 PM:
---------------------------------------------------------------------------

*Here are two options I had in mind.*

*Option-1:* Persist the snapshot entities in batches. This may significantly reduce the DB
operations. Currently there is one DB operation for one entry in snapshot. Which does not
scale.

*Option-2:* Break the total snapshot into to batches and persist all of them in parallel in
different transactions. As we use using repeatable_read isolation level we should be able
to have parallel writes on the same table. This bring an issue if there is a failure in persisting
any of the batches. This approach needs additional logic of cleaning the partially persisted
snapshot.

 


was (Author: kkalyan):
*Here are two options I had in mind.*

*Option-1:* Persist the snapshot entities in batches. This may significantly reduce the DB
operations. Currently there is one DB operation for one entry in snapshot. Which does not
scale. 

*Option-2:* Break the total snapshot into to batches and persist all of them in parallel in
different transactions. As we use using repeatable_read isolation level we should be able
to have parallel writes on the same table. This bring an issue if there is a failure in persisting
any of the batches. This approach needs additional logic of cleaning the partially persisted
snapshot. I’m evaluating this option. 

 

> Persist HMS Full Snapshot in batches.
> -------------------------------------
>
>                 Key: SENTRY-2249
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2249
>             Project: Sentry
>          Issue Type: Improvement
>          Components: Sentry
>    Affects Versions: 2.1.0
>            Reporter: kalyan kumar kalvagadda
>            Assignee: kalyan kumar kalvagadda
>            Priority: Major
>         Attachments: SENTRY-2249.001.patch
>
>
> Currently each entry in full snapshot of HMS is persisted one entry at a time. Instead
it could be optimized by persisting the entries in batches. DB operations are expensive, reducing
the number of database operations should help. This would decrease the time to persist the
snapshot in to database significantly.
> Size of the batch could be configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message