flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fora <gyf...@apache.org>
Subject Re: how load/group with large csv files
Date Tue, 21 Oct 2014 12:20:56 GMT
I am not sure how you should go about that, let’s wait for some feedback from the others.


Until then you can always map the array to (array, keyfield) and use groupBy(1).


> On 21 Oct 2014, at 14:17, Martin Neumann <mneumann@spotify.com> wrote:
> 
> Hej,
> 
> Unfortunately .sort() cannot take a key extractor, would I have to do the
> sort myself then?
> 
> cheers Martin
> 
> On Tue, Oct 21, 2014 at 2:08 PM, Gyula Fora <gyfora@apache.org> wrote:
> 
>> Hey,
>> 
>> Using arrays is probably a convenient way to do so.
>> 
>> I think the way you described the groupBy only works for tuples now. To do
>> the grouping on the array field, you would need to create a key extractor
>> for this and pass that to groupBy.
>> 
>> Actually we have some use-cases like this for streaming so we are thinking
>> of writing a wrapper for the array types that would behave as you described.
>> 
>> Regards,
>> Gyula
>> 
>>> On 21 Oct 2014, at 14:03, Martin Neumann <mneumann@spotify.com> wrote:
>>> 
>>> Hej,
>>> 
>>> I have a csv file with 54 columns each of them is string (for now). I
>> need
>>> to group and sort them on field 15.
>>> 
>>> Whats the best way to load the data into Flink?
>>> There is no Tuple54 (and the <> would look awful anyway with 54 times
>>> String in it).
>>> My current Idea is to write a Mapper and split the string to Arrays of
>>> Strings would grouping and sorting work on this?
>>> 
>>> So can I do something like this or does that only work on tuples:
>>> Dataset<String[]> ds;
>>> ds.groupBy(15).sort(20. ANY)
>>> 
>>> cheers Martin
>> 
>> 


Mime
View raw message