flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Hogan <c...@greghogan.com>
Subject Re: Variable Tuple Type
Date Wed, 15 Mar 2017 19:49:01 GMT
Hi Max,

Belated response but this looks to be the same problem I am working to
solve in Gelly with graph data in FLINK-3695 [0]. These arrays allow for
object reuse. Interface is here [1]. Additional Value types are easy to add
but Long, Int, and String are most common to Gelly.

Suggestions are welcome, and please let us know how you've solved your


[0] https://issues.apache.org/jira/browse/FLINK-3695

On Mon, Dec 5, 2016 at 11:28 AM, Max Kie├čling <mailing@kopfueber.org> wrote:

> Hey,
> for a project we need to represent data as lists. So each entry in the
> DataSets basically holds a list of basic data type elements. When
> processing the data we keep joining lists of the same shape and so the
> list size grows over time
> e.g. (a,b,c) x (c,d,e) -> (a,b,c,d,e)
> Currently our solution basically is
> to use a `DataSet<List<Long>>`.
> The problem with this is, that the performance seems to be quite poor
> compared to using tuples. When we compare the same job using either
> Tuples or Lists, Tuples seem to be 2-10 times faster.
> However since we can't know in advance how many elements the list will
> have for a given job, using the Tuple0-25 would be both cumbersome and
> complex, especially if the list size outgrows 25.
> Using the Record class, the results look promising but using Tuples is
> still double as fast.
> In general our tests yield that the runtime is
> Tuple < Array < Record < List
> So my question is, do you see a possible way to create a variable length
> tuple type which can grow almost indefinitely while keeping most of the
> benefits of the TupleXX classes but skipping lots of the overhead of
> Record (like keeping track of possible null values etc)
> Thanks a lot
> Best Max

View raw message