asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (ASTERIXDB-2133) Unnecessary BinarySearch in GroupFrameAccessor
Date Wed, 01 Nov 2017 04:13:00 GMT


ASF subversion and git services commented on ASTERIXDB-2133:

Commit c04046c11be062c4320ba7e0bbd5258ffb600ad3 in asterixdb's branch refs/heads/master from
[;h=c04046c ]

[ASTERIXDB-2133] Fix unncessary binary search in GroupFrameAccessor

- user model changes: no
- storage format changes: no
- interface changes: no

- GroupFrameAccessor holds a list of frames from a run during the merge
step of merge sort. However, everytime we access a tuple, it performs
binary search to get the physical tuple index. This patch fixes this
by remembering the last accessed frame. It is expected that tuples
are accessed sequentially (since it's the merge step), which greatly
reduces binary searches

Change-Id: I4a1b19ad47f6b1dda4bd5c417932e4c9ba36a714
Sonar-Qube: Jenkins <>
Tested-by: Jenkins <>
Contrib: Jenkins <>
Integration-Tests: Jenkins <>
Reviewed-by: Ian Maxon <>

> Unnecessary BinarySearch in GroupFrameAccessor
> ----------------------------------------------
>                 Key: ASTERIXDB-2133
>                 URL:
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: HYR - Hyracks
>            Reporter: Chen Luo
>            Assignee: Chen Luo
>            Priority: Major
> During the merge step of merge sort, if there is enough memory but only a few of runs
to be merged, we would load multiple frames per run into the GroupFrameAccessor. Every time
when we access a tuple, GroupFrameAccessor performs binary search over the inner frames to
translate logical tuple index into the physical one (inner frame Id + index).
> However, this is highly inefficient, and partially results in the fact that more memory
budget of the sort operation would result in slower performance. Since GroupFrameAccessor
is only used by merge sort, it is expected that tuples are accessed sequentially, instead
of randomly. Specially optimizations can be adopted based on this sequentially access pattern.

This message was sent by Atlassian JIRA

View raw message