flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ggevay <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-3477] [runtime] Add hash-based combine ...
Date Mon, 30 May 2016 06:16:42 GMT
Github user ggevay commented on a diff in the pull request:

    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/operators/ReduceCombineDriver.java
    @@ -42,34 +44,38 @@
      * Combine operator for Reduce functions, standalone (not chained).
      * Sorts and groups and reduces data, but never spills the sort. May produce multiple
      * partially aggregated groups.
    - * 
    + *
      * @param <T> The data type consumed and produced by the combiner.
     public class ReduceCombineDriver<T> implements Driver<ReduceFunction<T>,
T> {
     	private static final Logger LOG = LoggerFactory.getLogger(ReduceCombineDriver.class);
     	/** Fix length records with a length below this threshold will be in-place sorted, if
possible. */
     	private static final int THRESHOLD_FOR_IN_PLACE_SORTING = 32;
     	private TaskContext<ReduceFunction<T>, T> taskContext;
     	private TypeSerializer<T> serializer;
     	private TypeComparator<T> comparator;
     	private ReduceFunction<T> reducer;
     	private Collector<T> output;
    +	private DriverStrategy strategy;
     	private InMemorySorter<T> sorter;
     	private QuickSort sortAlgo = new QuickSort();
    +	private ReduceHashTable<T> table;
     	private List<MemorySegment> memory;
    -	private boolean running;
    +	private volatile boolean canceled;
    --- End diff --
    I think it's better to have `volatile` here. This variable will be set from a different
thread, and `volatile` is not only for atomicity, but also for memory consistency (seeing
the effect of a write in an other thread). If a variable is not volatile then the compiler
may assume in certain cases that it is only modified and read by one thread (17.4.2-5. in
[1]). (Also see [2].)
    Note: omitting the volatile probably wouldn't cause any actual bug here, because the loop
bodies are large so the compiler probably won't inline and analyze the entire call tree to
look for writes to this flag, but I wouldn't risk it. Also, I don't know how common it is
in Java that this stuff causes actual problems, but it actually happened to me in C++ once
that a loop like this was effectively turned into `while(true)` by the compiler, because my
flag was not volatile. It was a nasty debugging session. (Another problematic thing that the
compiler is allowed to do with non-volatile variables is to cache the value of the variable
in a register, and not read it from memory if it can turn all possible writes to it by the
current thread into a write to the register.)
    [1] http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.3
    [2] http://stackoverflow.com/questions/106591/do-you-ever-use-the-volatile-keyword-in-java

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message