flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darshan Singh <darshan.m...@gmail.com>
Subject Re: how to query the output of the scalar table function
Date Wed, 04 Apr 2018 17:09:18 GMT
Thanks Fabian

We are going to replace all scalar functions with the table functions.


On Wed, Apr 4, 2018 at 12:16 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Darshan,
> What you observe is the result of what's supposed to be an optimization.
> By fusing the two select() calls, we reduce the number of operators in the
> resulting plan (one MapFunction less).
> This optimization is only applied for ScalarFunctions but not for
> TableFunctions.
> With a better cost-modeling that increases the cost of user-defined
> ScalarFunctions it should be possible to "convince" the optimizer to not
> fuse the operators.
> For now, you could either use the TableFunction approach or convert the
> result of the first select() into a DataStream (or DataSet depending on
> your setup) and register that again as a table.
> That would split the plan into two parts which are independently optimized
> and hence the select() operators would not be merged.
> Best, Fabian
> 2018-03-30 22:07 GMT+02:00 Darshan Singh <darshan.meel@gmail.com>:
>> Hi,
>> I am not able to find what is best way to query the output of a scalar
>> table function.
>> Suppose I have table which has column col1 which is string.
>> I have a scalar function and returns a POJO
>> {col1V1 String, col1V2 String , col1V3 String}.
>> I am using following.
>> so table.select("sf(col1) as sfc")
>> .select("sfc.get('col1V1') as v1, sfc.get('col1V2') as v2 ,
>> sfc.get('col1V13') as v3 ")
>> It is working fine but strangely enough it calls the sf 3 times(each per
>> get) for same row. However, If I use table function it calls just once.
>> So to me it seems that scalar function is very expensive so if I have 10
>> columns and the computation is expensive it will do 10 times. Thats why I
>> thought maybe there is a better way to return pojo from scalar function
>> rather than what I have been using.
>> If this is the best way then I wonder if scalar functions should be used
>> to return single value only.
>> Thanks

View raw message