flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fora <gyf...@apache.org>
Subject Re: how load/group with large csv files
Date Tue, 21 Oct 2014 12:08:51 GMT
Hey,

Using arrays is probably a convenient way to do so.

I think the way you described the groupBy only works for tuples now. To do the grouping on
the array field, you would need to create a key extractor for this and pass that to groupBy.

Actually we have some use-cases like this for streaming so we are thinking of writing a wrapper
for the array types that would behave as you described.

Regards,
Gyula

> On 21 Oct 2014, at 14:03, Martin Neumann <mneumann@spotify.com> wrote:
> 
> Hej,
> 
> I have a csv file with 54 columns each of them is string (for now). I need
> to group and sort them on field 15.
> 
> Whats the best way to load the data into Flink?
> There is no Tuple54 (and the <> would look awful anyway with 54 times
> String in it).
> My current Idea is to write a Mapper and split the string to Arrays of
> Strings would grouping and sorting work on this?
> 
> So can I do something like this or does that only work on tuples:
> Dataset<String[]> ds;
> ds.groupBy(15).sort(20. ANY)
> 
> cheers Martin


Mime
View raw message