flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ggevay <...@git.apache.org>
Subject [GitHub] flink issue #3511: [Flink-5734] code generation for normalizedkey sorter
Date Fri, 29 Sep 2017 15:08:52 GMT
Github user ggevay commented on the issue:

    https://github.com/apache/flink/pull/3511
  
    > IMHO, in addition to these changes, there are still some potential improvements we
can do about the sorter, like deserialization when comparing the real records.
    
    Do you mean `NormalizedKeySorter.compareRecords`? This calls `comparator.compareSerialized`,
which is currently implemented for most types by simply deserializing the records and then
comparing the deserialized form. It would maybe bring some performance improvement if this
had a real implementation, which would deserialize only the key fields (which are relevant
to the comparison). The hard part here is how to handle nullable fields, which make the offset
of individual fields unpredictable.
    
    So I think a simpler and better approach is to just make sure that most types have a good
implementation of `putNormalizedKey`, and then `NormalizedKeySorter.compareRecords` would
be called only rarely, so its performance wouldn't really matter.
    
    > We are working on a project to fully code generate the algorithm Flink runtime used,
and borrowed lots of idea of this PR, thanks!
    
    I'm happy to hear that!
    
    Btw. I think a large potential in code-generation is to eliminate the overheads of the
very many virtual function calls throughout the runtime, by making the calls monomorphic [1,2].
For example, there could be a custom `MapDriver` generated for each map UDF, and then this
custom class would always call the same UDF, making the call monomorphic, and thus easily
optimizable by the JVM [1,2]. (I think @greghogan was also thinking about this.)
    
    For example, the following virtual calls are on the per-record code path:
    - drivers (such as `MapDriver`) calling `MutableObjectIterator.next`
    - drivers calling the UDF
    - drivers calling `output.collect`
    - `ReusingDeserializationDelegate` or `NonReusingDeserializationDelegate` calling `serializer.deserialize`
    - `SpillingResettableMutableObjectIterator` calling `serializer.serialize` and `serializer.deserialize`
    - `PojoSerializer`, `TupleSerializer`, `RowSerializer`, etc. calling their field serializers
    - `CountingCollector` calling `collector.collect`
    - `RecordWriter` calling `channelSelector.selectChannels`
    - `SerializationDelegate` calling `serializer.serialize`
    - `StreamElementSerializer` calling methods on its `typeSerializer`
    - `OutputEmitter` calling `comparator.hash`
    - `ComparableKeySelector` calling `comparator.extractKeys`
    - `assignToKeyGroup` calling `key.hashCode`
    
    I think most of the above calls are megamorphic (especially in larger Flink jobs with
many operators), which makes them slow [1,2]. They could be made monomorphic by code-generating
custom versions of these classes, where the customization would be to fix the type of the
targets of these calls. (I think this could probably be done independently of the Table API.)
    
    Another potential for code-generation is customizing `OutputEmitter` for the number of
channels: there are places where a modulo operation is performed, which is much faster if
the divisor is a compile-time constant, since then the compiler uses such tricks as described
in this paper:
    https://gmplib.org/~tege/divcnst-pldi94.pdf
    And the same optimization could be made in `KeyGroupRangeAssignment`, where there are
divisions by `maxParallelism`.
    
    > we need more type information control and flexible code generating supports, so we
choose to do it through the Table API & SQL. How do you see this approach?
    
    Having more info about the types, UDFs, etc. of your program certainly can help. Unfortunately
I don't know too much about the Table API & SQL, but I have a few random thoughts:
    - Since you are generating the UDFs, it should be possible to make them have good object
reuse behavior, without the users having to worry about the non-trivial object reuse rules.
    - For basic types (int, boolean, etc.) the generated code could use `IntValue`, `BooleanValue`,
etc., facilitating more object reuse, again without making the users write ugly boilerplate
(as they have to do in the DataSet/DataStream API when they want to use these classes).
    - Same thing with `StringValue`
    
    Btw. have you seen this PR for code generation for POJO serializers and comparators? https://github.com/apache/flink/pull/2211
    It has some issues, so it is not as close to merging as this PR, but maybe I'll try to
tie that up as well some time.
    
    [1] http://insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/
    [2] https://shipilev.net/blog/2015/black-magic-method-dispatch/


---

Mime
View raw message