phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoffrey Jacoby (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-541) Make mutable batch size bytes-based instead of row-based
Date Mon, 05 Dec 2016 23:16:58 GMT


Geoffrey Jacoby commented on PHOENIX-541:

Thanks for the feedback, [~jamestaylor]. Will get an updated patch up soon. 

One question: is it worth keeping a distinction between MAX_MUTATION_SIZE_ATTRIB and MUTATE_BATCH_SIZE_ATTRIB
in the _BYTES version? Couldn't the logic just be:

1. If a single mutation is bigger than MUTATE_BATCH_SIZE_BYTES, throw the IllegalArgumentException
in MutationState.throwIfTooBig rather than using MAX_MUTATION_SIZE as it currently does in
most cases. (And do the same in any similar checks elsewhere.) 
2. If each individual mutation is smaller than the threshold, make sure we commit the requests
in batches no larger than MUTATE_BATCH_SIZE_BYTES_ATTRIB, either in MutationState or in the
UngroupedAggregateRegionObserver. (And during the transition, also apply the existing logic
for row count until the deprecated properties are removed.) 

This way, when the deprecated properties are eventually removed, there's only one easy to
understand knob -- MUTATE_BATCH_SIZE_BYTES_ATTRIB -- for guarding against giant WALEdits rather
than two which could be misconfigured to be contradictory or nonsensical .

> Make mutable batch size bytes-based instead of row-based
> --------------------------------------------------------
>                 Key: PHOENIX-541
>                 URL:
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 3.0-Release
>            Reporter: mujtaba
>            Assignee: Geoffrey Jacoby
>              Labels: newbie
>             Fix For: 4.10.0
>         Attachments: PHOENIX-541.patch
> With current configuration of row-count based mutable batch size, ideal value for batch
size is around 800 rather then current 15k when creating indexes based on memory consumption,
CPU and GC (data size: key: ~60 bytes, 14 integer column in separate CFs)

This message was sent by Atlassian JIRA

View raw message