flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fhueske <...@git.apache.org>
Subject [GitHub] incubator-flink pull request: [FLINK-758] Add count operator to Da...
Date Sun, 07 Sep 2014 14:29:48 GMT
Github user fhueske commented on the pull request:

    https://github.com/apache/incubator-flink/pull/63#issuecomment-54748255
  
    I had a look at this PR and found a few issues:
    - it contains changes for several independent features
      - Initial value for ReduceFunction
      - Count operator
      - many cosmetic changes / documentation improvements
    - my gut feeling is, that rebasing this PR onto the current master will cause many merge
conflicts. It might be worthwhile to separate these issues into independent PRs to make the
merging easier.
    - counting for grouped datasets is done with a non-combinable GroupReduceFunction which
is not vey efficient
    - An initial value for ReduceFunction is only supported for AllReduce. I see that the
original motivation for this (a 0-valued count for empty datasets) does not make sense for
grouped ReduceFunctions, but this is not the only way an initial value could be used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message