phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankit Singhal (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-2143) Use guidepost bytes instead of region name in stats primary key
Date Tue, 12 Jan 2016 08:58:40 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ankit Singhal updated PHOENIX-2143:
-----------------------------------
    Attachment: PHOENIX-2143_v2.patch

[~samarthjain], sorry for this.. As I made changes in all those function which I reverted
later because of design change suggested by [~giacomotaylor] and didn't find the best way
to revert that without formatting again. can you please ignore this time?

Related to your question for removing code for stats adjustment during split and merge, now
we don't need this as this was required when we are maintaining stats at region level where
we need to remove the stats for the daughter region in case of merge and In case of split,
we need to divide the stats of parent region and assign to new daughter regions. But as guidePosts
are stored at key level ,we don't need this and same for split update statistics test.

Related to the test case, It was added to check the duplicates which may come with improper
stats creation, although the change of testing duplicates can be made to existing test also
but I want this to be simple for easy debugging as the keys are easily visible as compared
to other tests which have arrays used. 

Yes, I ran all the test's and they are passing. PFA, updated patch with one bug fixed.

Regards,
Ankit Singhal

> Use guidepost bytes instead of region name in stats primary key
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-2143
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2143
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>         Attachments: PHOENIX-2143.patch, PHOENIX-2143_v2.patch, PHOENIX-2143_wip.patch,
PHOENIX-2143_wip_2.patch
>
>
> Our current SYSTEM.STATS table uses the region name as the last column in the primary
key constraint. Instead, we should use the MIN_KEY column (which corresponds to the region
start key). The advantage would be that the stats would then be ordered by region start key
allowing us to approximate the number of guideposts which would be traversed given the start/stop
row of a scan:
> {code}
> SELECT SUM(guide_posts_count) FROM SYSTEM.STATS WHERE min_key > :1 AND min_key <
:2
> {code}
> where :1 is the start row and :2 is the stop row of the scan. With an UNNEST operator
for ARRAYs, we could get a better approximation.
> As part of the upgrade to the new Phoenix version containing this fix, stats could simply
be dropped and they'd be recalculated with the new schema.
> An alternative, even more granular approach would be to *not* use arrays to store the
guide posts, but instead store them as individual rows with a schema like this.
> |PHYSICAL_NAME|VARCHAR|
> |COLUMN_FAMILY|VARCHAR|
> |GUIDE_POST_KEY|VARBINARY|
> In this alternative, the maintenance during compaction is higher, though, as you'd need
to run a separate query to do the deletion of the old guideposts, followed by a commit of
the new guideposts. The other disadvantage (besides requiring multiple queries) is that this
couldn't be done transactionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message