flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@apache.org>
Subject Re: Support for Total Order Partitioner ?
Date Fri, 05 Dec 2014 10:39:34 GMT
Hi Johannes,

right now the is no straight-forward way to write data in total order in
Flink.

The engine allows to locally sort data before it is given to a data sink.
However, this is not exposed to the API right now but should be fairly easy
to add (FLINK-1105). Given this feature, you can have a total order with a
data sink with DOP=1 (obviously, that doesn't work for large output....).
Flink recently added custom partitioners which can be used for manual range
partitioning if you know the distribution of the sort keys. Unfortunately,
custom partitioners are not documented, yet (FLINK-1260), see the PR for
information: https://github.com/apache/incubator-flink/pull/207.

For automatically fully sorted output, Flink would need to derive the
distribution itself just as you described for PIG. This would be a
non-trivial change.

tldr;
With FLINK-1105 fixed and a custom partitioner (given you know the
distribution), you can have totally ordered output in parallel.

2014-12-04 21:54 GMT+01:00 Kirschnick, Johannes <
johannes.kirschnick@tu-berlin.de>:

> Hi,
>
>
> I have a small question about a custom partitioners.
>
>
> I couldn't really find a method in the Java API which lets me partition
> the dataset in a total order.
>
> Is that something that I just overlooked, or is that something not really
> supported?
>
>
> The use case is writing out data so that it can be consumed by another
> tool. and also for quick human inspection
>
> I can easily apply the sorting on a hash partition, but I would loose the
> total order, that all keys in partition one come after or before partition
> two etc.
>
>
> PIG supports this by mapping the ORDER BY statement to 2 jobs. The first
> one samples the data to build a sketch to approximate the key distribution
> and then feds the result into the total order partitioner.
>
>
> Thanks
>
>
> Johannes
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message