phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-3788) GLOBAL_MUTATION_BATCH_SIZE should reflect size of chunked batches
Date Mon, 22 May 2017 18:47:04 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019998#comment-16019998
] 

James Taylor edited comment on PHOENIX-3788 at 5/22/17 6:46 PM:
----------------------------------------------------------------

[~gjacoby] - we track metrics on the client to track for a given query, how big is the batch
size (among many other metrics). These metrics drive the Splunk dashboards used to monitor
Phoenix. I think this corner case is important to fix because otherwise alarms may go off
when a user runs a DELETE with auto commit on. The client would report a huge batch size in
this case when in actuality it's broken up on the server side into batches. Or it might be
the case that we're not even reporting these metrics which is a hole as we'd be severely undercounting
the bytes/rows committed.

Here's what I think the simplest fix would be:
- In both DeleteCompiler and UpsertCompiler in the runOnServer branches (in the MutationPlan.execute
method), pass through the client-side batch size on a new scan attribute.
- In UngroupedAggregateRegionObserver.doPostScannerOpen() method, look first for the new scan
attribute for the batch size and only use the server-side batch size config if not found.
- Create a new MutationState constructor that passes through the batch size and internally
calls GLOBAL_MUTATION_BYTES.update() in a loop (basically simulating the batches that the
server would have done).
- Use this new constructor in DeleteCompile and UpsertCompiler for the runOnServer branches.

FYI, [~samarthjain]. Thoughts? Is the batch size only tracked as a global metric?


was (Author: jamestaylor):
We track metrics on the client to track for a given query, how big is the batch size (among
many other metrics). These metrics drive the Splunk dashboards used to monitor Phoenix. I
think this corner case is important to fix because otherwise alarms may go off when a user
runs a DELETE with auto commit on. The client would report a huge batch size in this case
when in actuality it's broken up on the server side into batches. Or it might be the case
that we're not even reporting these metrics which is a hole as we'd be severely undercounting
the bytes/rows committed.

Here's what I think the simplest fix would be:
- In both DeleteCompiler and UpsertCompiler in the runOnServer branches (in the MutationPlan.execute
method), pass through the client-side batch size on a new scan attribute.
- In UngroupedAggregateRegionObserver.doPostScannerOpen() method, look first for the new scan
attribute for the batch size and only use the server-side batch size config if not found.
- Create a new MutationState constructor that passes through the batch size and internally
calls GLOBAL_MUTATION_BYTES.update() in a loop (basically simulating the batches that the
server would have done).
- Use this new constructor in DeleteCompile and UpsertCompiler for the runOnServer branches.

FYI, [~samarthjain]. Thoughts? Is the batch size only tracked as a global metric?

> GLOBAL_MUTATION_BATCH_SIZE should reflect size of chunked batches
> -----------------------------------------------------------------
>
>                 Key: PHOENIX-3788
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3788
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.10.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>             Fix For: 4.11.0
>
>         Attachments: PHOENIX-3788.patch
>
>
> As part of PHOENIX-541, we started chunking large MutationStates into multiple smaller
batches transparently. However, the relevant metric, GLOBAL_MUTATION_BATCH_SIZE, still is
updated with the total batch size, not the size of each chunk. This means you can't see the
actual batch sizes which are being submitted to HBase. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message