flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Some thoughts about query max && min block count in CompactingHashTable
Date Tue, 18 Aug 2015 12:12:21 GMT
Hi!

The thinking is good. Although in this case, it is probably not beneficial
to replace the list by a heap or so, due to the rare usage of that function.

As you said, the cost depends on how often the "getMaxPartition()" is
called.

Current approach: seldom O(n) - (only on function call)
Heap approach: frequent O(log(n)) - (every time the block count changes).

If I see it correctly, the getMaxPartition() function is called only in
exception cases, when operation stops anyways.

All in all, we should probably keep the list. If the function would be
called frequently, we should change this to a tree.

BTW: As a side note, what makes it even more tricky is that linear array
list traversal is quite more cache efficient, while logarithmic tree
traversal checks fewer elements, but has more memory jumps (cache misses)
on the way.

Greetings,
Stephan








On Tue, Aug 18, 2015 at 11:33 AM, huangwei (G) <huangwei111@huawei.com>
wrote:

> Hi,
> I found some like the code following may affect performance( Time
> complexity is O(n) ):
>
> private int getMaxPartition() {
>    int maxPartition = 0;
>    for(InMemoryPartition<T> p1 : this.partitions) {
>       if(p1.getBlockCount() > maxPartition) {
>          maxPartition = p1.getBlockCount();
>       }
>    }
>    return maxPartition;
> }
>
> These codes is in the FLINK-RUNTIME and may be called many times when
> flink runs.
> As I learned, querying a max(min) number from a set of numbers can be
> optimized into O(log(n)) even O(1) ( .i.e RMQ, heap, segment tree, treap….
> ).
>
> And there is a thought of mine:
>
> Using TreeMap as a Temporary storage for the size of partition pages.
>
> Key stands for the size of partition pages.
>
> Value stands for the count of the same size.
>
> The TreeMap will synchronous update when this.partitions adds or removes a
> element.
>
> So we can just use TreeMap.lastKey() as the
> maxPartition(TreeMap.firstKey() as the minPartition).
>
> And all the operations(put, remove, lastKey, firstKey) in TreeMap cost
> O(log(n)).
>
> If getMaxPartition() is called many times, this may improve performance.
>
>
>
> And I am here looking for your opinions.
>
>
> Greetings,
> Huang Wei
> 华为技术有限公司 Huawei Technologies Co., Ltd.
>
>
> Tel:+86 18106512602
> Email:huangwei111@huawei.com
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message