flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gábor Gévay <gga...@gmail.com>
Subject Re: [DISCUSS] FLIP-18: Code Generation for improving sorting performance
Date Thu, 23 Mar 2017 17:31:32 GMT
Hello,

> Second, additional classes will turn performance critical callsites megamorphic.

Yes, this is a completely valid point, thanks for raising this issue
Greg. We were planning to have an offline discussion tomorrow with
Pattarawat about this. We have a few options:
1. We could fuse the QuickSort and NormalizedKeySorter into a single
class when generating the code.
2. As you said, duplicating the QuickSort class might also be a nice
solution, if we can find a simple way to pull it off.
3. We might do some benchmarks and determine that the effect of this
issue is negligible and therefore we can ignore it. But I'm worried
that this is not the case, so we'll probably have to go with one of
the above options.

Best,
Gábor



On Thu, Mar 23, 2017 at 6:12 PM, Greg Hogan <code@greghogan.com> wrote:
> I would be more than happy to shepherd and review this PR.
>
> I have two discussion points. First, a strategy for developing with templates. IntelliJ
has a FreeMarker plugin but we lose formatting and code completion. To minimize this issue
we can retain the untemplated code in an abstract class which is then concretely subclassed
by the template.
>
> Second, additional classes will turn performance critical callsites megamorphic. Stephan
noted this issue in his work on MemorySegment.
>   http://flink.apache.org/news/2015/09/16/off-heap-memory.html
>
> For example, QuickSort calls IndexedSortable#compare and IndexedSortable#swap. With multiple
compiled implementations of the sorter template these callsites can no longer be inlined (the
same is true with NormalizedKeySorter and FixedLengthRecordSorter if the latter was instrumented).
>
> I have not found a way to duplicate a Java class at runtime, but we may be able to use
Janino to compile a class which is then uniquely renamed: each IndexSortable type would map
to a different QuickSort type (same bytecode, but uniquely optimized). This should also boost
performance of runtime operators calling user defined functions.
>
> Given the code already written, I expect we can refactor, review, and benchmark for the
1.3 release.
>
> Greg
>
>
>> On Mar 21, 2017, at 3:46 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>>
>> Hi Pat,
>>
>> thanks a lot for this great proposal! I think it is very well structured and has
the right level of detail.
>> The improvements of your performance benchmarks look very promising and I think code-gen'd
sorters would be a very nice improvement.
>> I like that you plan to add a switch to activate this feature.
>>
>> In order move on, we will need a committer who "champions" your FLIP, reviews the
pull request, and eventually merges it.
>>
>> @Greg and @Stephan, what do you think about this proposal?
>>
>> Best, Fabian
>>
>>
>> 2017-03-14 16:10 GMT+01:00 Pattarawat Chormai <pat.chormai@gmail.com <mailto:pat.chormai@gmail.com>>:
>> Hi all,
>>
>> I would like to initiate a discussion of applying code generation to NormalizedKeySorter.
The goal is to improve sorting performance by generating suitable NormalizedKeySorter for
underlying data. This generated sorter will contains only necessary code in important methods,
such as swap and compare, hence improving sorting performance.
>>
>> Details of the implementation is illustrated at FLIP-18 : Code Generation for improving
sorting performance. <https://cwiki.apache.org/confluence/display/FLINK/FLIP-18%3A+Code+Generation+for+improving+sorting+performance>
>>
>>
>> Also, because we’re doing it as a course project at TUB, we have completed the
implementation and made a pull-request <https://github.com/apache/flink/pull/3511> to
Flink repo already.
>>
>> From our evaluation, we have found that the pull-request reduces sorting time around
7-10% and together with FLINK-3722 <https://issues.apache.org/jira/browse/FLINK-3722>
the sorting time is decreased by 12-20%.
>>
>>
>>
>> Please take a look at the document and the pull-request and let me know if you have
any suggestion.
>>
>> Best,
>> Pat
>>
>

Mime
View raw message