asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingyi Bu (Code Review)" <do-not-re...@asterixdb.incubator.apache.org>
Subject Change in asterixdb[master]: Avoid always merging old components in prefix policy
Date Mon, 12 Jun 2017 19:21:54 GMT
Yingyi Bu has posted comments on this change.

Change subject: Avoid always merging old components in prefix policy
......................................................................


Patch Set 6:

(3 comments)

https://asterix-gerrit.ics.uci.edu/#/c/1818/6/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/PrefixMergePolicy.java
File hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/PrefixMergePolicy.java:

PS6, Line 201: if (mergableIndexes != null) {
             :             return mergableIndexes.getRight() - mergableIndexes.getLeft() +
1;
             :         } else {
             :             return 0;
             :         }
return mergeableIndexes == null? 0: mergableIndexes.getRight() - mergableIndexes.getLeft()
+ 1;


PS6, Line 248: for (int i = startIndex; i <= endIndex; i++) {
             :             mergableComponents.add(immutableComponents.get(i));
             :         }
mergableComponents.addAll(immutableComponents.subList(startIndex, endIndex+1)) ?


PS6, Line 273:  private Pair<Integer, Integer> getMergableComponentsIndex(List<ILSMDiskComponent>
immutableComponents) 
It feels that there're some repeated work done in this method. Because this method is called
for each new component that results from a merge.

I think we potentially can make this method an O(n) operation if we add a parameter newComponent
into diskComponentAdded(final ILSMIndex index, boolean fullMergeIsRequested).  (Note that
there're scenarios that we are interested in a large number of components, e.g., Cloudberry,
CB, for either better read perf. or better ingestion perf.).

I'm thinking that maybe  we can different whether the new component is resulted from a FLUSH
or from MERGE, based on their sizes, for example.  Let's say we keep the list of components
ordered from younger to older (without reverse):

-- for a new FLUSH-result component, we just need to have a sliding window to decide whether
it needs to be merged with older FLUSH-result component;  (We need to do that because a FLUSH-result
component might not be called into this method because of line 58.)

-- for a new MERGE-result component Cm, we just need to check its preceding component to identify
a contiguous mergeable window (if any).


We can take three properties to simplify the mergeable window selection here:

1.  This method is called once per MERGE-result component is added;

2.  A FLUSH-result component should probably only merge with a FLUSH-result component;

3.  Whenever a new component, either from FLUSH or MERGE, we only identify a mergeable window
(with older components) starting from that component.


Thoughts?

Maybe we need an offline discussion.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1818
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I464da3fed38cded0aee7b319a35664eae069a2ba
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Luo Chen <cluo8@uci.edu>
Gerrit-Reviewer: Ian Maxon <imaxon@apache.org>
Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Jianfeng Jia <jianfeng.jia@gmail.com>
Gerrit-Reviewer: Luo Chen <cluo8@uci.edu>
Gerrit-Reviewer: Yingyi Bu <buyingyi@gmail.com>
Gerrit-Reviewer: abdullah alamoudi <bamousaa@gmail.com>
Gerrit-HasComments: Yes

Mime
View raw message