flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Congxian Qiu <qcx978132...@gmail.com>
Subject Re: Broadcast state
Date Wed, 09 Oct 2019 07:13:10 GMT
Hi,

After using Redis, why there need to care about eliminate duplicated data,
if you specify the same key, then Redis will do the deduplicate things.

Best,
Congxian


Fabian Hueske <fhueske@gmail.com> 于2019年10月2日周三 下午5:30写道:

> Hi,
>
> State is always associated with a single task in Flink.
> The state of a task cannot be accessed by other tasks of the same operator
> or tasks of other operators.
> This is true for every type of state, including broadcast state.
>
> Best, Fabian
>
>
> Am Di., 1. Okt. 2019 um 08:22 Uhr schrieb Navneeth Krishnan <
> reachnavneeth2@gmail.com>:
>
>> Hi,
>>
>> I can use redis but I’m still having hard time figuring out how I can
>> eliminate duplicate data. Today without broadcast state in 1.4 I’m using
>> cache to lazy load the data. I thought the broadcast state will be similar
>> to that of kafka streams where I have read access to the state across the
>> pipeline. That will indeed solve a lot of problems. Is there some way I can
>> do the same with flink?
>>
>> Thanks!
>>
>> On Mon, Sep 30, 2019 at 10:36 PM Congxian Qiu <qcx978132955@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Could you use some cache system such as HBase or Reids to storage this
>>> data, and query from the cache if needed?
>>>
>>> Best,
>>> Congxian
>>>
>>>
>>> Navneeth Krishnan <reachnavneeth2@gmail.com> 于2019年10月1日周二
上午10:15写道:
>>>
>>>> Thanks Oytun. The problem with doing that is the same data will be have
>>>> to be stored multiple times wasting memory. In my case there will around
>>>> million entries which needs to be used by at least two operators for now.
>>>>
>>>> Thanks
>>>>
>>>> On Mon, Sep 30, 2019 at 5:42 PM Oytun Tez <oytun@motaword.com> wrote:
>>>>
>>>>> This is how we currently use broadcast state. Our states are re-usable
>>>>> (code-wise), every operator that wants to consume basically keeps the
same
>>>>> descriptor state locally by processBroadcastElement'ing into a local
state.
>>>>>
>>>>> I am open to suggestions. I see this as a hard drawback of dataflow
>>>>> programming or Flink framework?
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Oytun Tez
>>>>>
>>>>> *M O T A W O R D*
>>>>> The World's Fastest Human Translation Platform.
>>>>> oytun@motaword.com — www.motaword.com
>>>>>
>>>>>
>>>>> On Mon, Sep 30, 2019 at 8:40 PM Oytun Tez <oytun@motaword.com>
wrote:
>>>>>
>>>>>> You can re-use the broadcasted state (along with its descriptor)
that
>>>>>> comes into your KeyedBroadcastProcessFunction, in another operator
>>>>>> downstream. that's basically duplicating the broadcasted state whichever
>>>>>> operator you want to use, every time.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Oytun Tez
>>>>>>
>>>>>> *M O T A W O R D*
>>>>>> The World's Fastest Human Translation Platform.
>>>>>> oytun@motaword.com — www.motaword.com
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 30, 2019 at 8:29 PM Navneeth Krishnan <
>>>>>> reachnavneeth2@gmail.com> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Is it possible to access a broadcast state across the pipeline?
For
>>>>>>> example, say I have a KeyedBroadcastProcessFunction which adds
the incoming
>>>>>>> data to state and I have downstream operator where I need the
same state as
>>>>>>> well, would I be able to just read the broadcast state with a
readonly
>>>>>>> view. I know this is possible in kafka streams.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>

Mime
View raw message