flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiwan Park <chiwanp...@icloud.com>
Subject Re: why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not sort
Date Tue, 02 Jun 2015 00:04:34 GMT
Hi. The sortGroup API returns a SortedGrouping object and but you don’t use the result. I
think that you are confused with groupBy and sortGroup API. You should use this API such as
following (I assumed you are using 0.8 or 0.9-milestone-1):

// select the first 10 data for each group.
DataSet<Customer> sorted = customers.groupBy(2).sortGroup(0, Order.DESCENDING).first(10);
System.out.println(sorted.print());

Note that Flink does not support global sort (FLINK-598) but only support local sort currently.
The sortGroup API means that sorting for each group.


Regards,
Chiwan Park

> On Jun 2, 2015, at 5:02 AM, hagersaleh <loveallah1987@yahoo.com> wrote:
> 
> why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not
> sort
> 
> I want sort DataSet How can I do that?
> 
> customers = customers.filter(
>            new FilterFunction<Customer>() {
>                    @Override
>                    public boolean filter(Customer c) {
> 
> 
>                        return    
> Integer.parseInt(c.getField(0).toString())<=5 ;
> 
>                    }
>            });	
> 
>       customers.groupBy(2).sortGroup(0, Order.DESCENDING);
>       System.out.println(customers.print()); 
>       customers.writeAsCsv("/home/hadoop/Desktop/Dataset/output.csv", "\n",
> "|");
>       env.execute();  
> 
> 
> public static class Customer extends
> Tuple5<Long,String,String,String,String> {
> 		
> 	}
>        private static DataSet<Customer>
> getCustomerDataSet(ExecutionEnvironment env) {
> 		return env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv")
> 					.fieldDelimiter('|')
> 
> .includeFields("11100110").ignoreFirstLine()
>                                        .tupleType(Customer.class);
> 	}
> 
> the result not sort
> 2> (1,Customer#000000001,IVhzIApeRb ot&&c&&E,711.56,BUILDING)
> 2> (2,Customer#000000002,XSTf4&&NCwDVaWNe6tEgvwfmRchLXak,121.65,AUTOMOBILE)
> 2> (3,Customer#000000003,MG9kdTD2WBHm,7498.12,AUTOMOBILE)
> 2> (4,Customer#000000004,XxVSJsLAGtn,2866.83,MACHINERY)
> 2> (5,Customer#000000005,KvpyuHCplrB84WgAiGV6sYpZq7Tj,794.47,HOUSEHOLD)
> 
> 
> 
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/why-when-use-groupBy-2-sortGroup-0-Order-DESCENDING-not-group-by-and-not-sort-tp1436.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.






Mime
View raw message