flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: max-fan
Date Thu, 03 Sep 2015 09:46:38 GMT
Hi Greg!

That number should control the merge fan in, yes. Maybe a bug was
introduced a while back that prevents this parameter from being properly
passed through the system. Have you modified the config value in the
cluster, on the client, or are you starting the job via the command line,
in which case both are the same? In any case, we'll fix that soon,
definitely. Could you open an issue for that?


Concerning the sub-optimal merging: You are right, this could be improved,
like you said. Right mow, the attempt is to create uniform files, but your
suggestion would be more efficient.
The part is here in the code
https://github.com/StephanEwen/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/UnilateralSortMerger.java#L1400

Is this a critical issue for you? Would you be up for making a patch for
this? It should be a fairly isolated change.


Greetings,
Stephan


On Thu, Sep 3, 2015 at 3:02 AM, Greg Hogan <code@greghogan.com> wrote:

> When workers spill more than 128 files, I have seen these fully merged
> into one or more much larger files. Does the following parameter allow more
> files to be stored without requiring the intermediate merge-sort? I have
> changed it to 1024 without effect. Also, it appears that the entire set of
> small files is reprocessed rather than the minimum required to attain the
> max fan-in (i.e., starting with 150 files, 23 would be merged leaving 128
> to be processed concurrently).
>
> taskmanager.runtime.max-fan: The maximal fan-in for external merge joins
> and fan-out for spilling hash tables. Limits the number of file handles per
> operator, but may cause intermediate merging/partitioning, if set too small
> (DEFAULT: 128).
>
> Greg Hogan
>

Mime
View raw message