flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Sharing Java Collections within Flink Cluster
Date Wed, 07 Sep 2016 17:33:20 GMT
Is writing DataStream2 to a Kafka topic and reading it from the other job
an option?

2016-09-07 19:07 GMT+02:00 Chakravarthy varaga <chakravarthyvp@gmail.com>:

> Hi Fabian,
>
>     Thanks for your response. Apparently these DataStream
> (Job1-DataStream1 & Job2-DataStream2) are from different flink applications
> running within the same cluster.
>     DataStream2 (from Job2) applies transformations and updates a 'cache'
> on which (Job1) needs to work on.
>     Our intention is to not use the external key/value store as we are
> trying to localize the cache within the cluster.
>     Is there a way?
>
> Best Regards
> CVP
>
> On Wed, Sep 7, 2016 at 5:00 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>
>> Hi,
>>
>> Flink does not provide shared state.
>> However, you can broadcast a stream to CoFlatMapFunction, such that each
>> operator has its own local copy of the state.
>>
>> If that does not work for you because the state is too large and if it is
>> possible to partition the state (and both streams), you can also use keyBy
>> instead of broadcast.
>>
>> Finally, you can use an external system like a KeyValue Store or
>> In-Memory store like Apache Ignite to hold your distributed collection.
>>
>> Best, Fabian
>>
>> 2016-09-07 17:49 GMT+02:00 Chakravarthy varaga <chakravarthyvp@gmail.com>
>> :
>>
>>> Hi Team,
>>>
>>>      Can someone help me here? Appreciate any response !
>>>
>>> Best Regards
>>> Varaga
>>>
>>> On Mon, Sep 5, 2016 at 4:51 PM, Chakravarthy varaga <
>>> chakravarthyvp@gmail.com> wrote:
>>>
>>>> Hi Team,
>>>>
>>>>     I'm working on a Flink Streaming application. The data is injected
>>>> through Kafka connectors. The payload volume is roughly 100K/sec. The event
>>>> payload is a string. Let's call this as DataStream1.
>>>> This application also uses another DataStream, call it DataStream2,
>>>> (consumes events off a kafka topic). The elements of this DataStream2
>>>> involves in a certain transformation that finally updates a Hashmap(/Java
>>>> util Collection). Apparently the flink application should share this
>>>> HashMap across the flink cluster so that DataStream1 application could
>>>> check the state of the values in this collection. Is there a way to do this
>>>> in Flink?
>>>>
>>>>     I don't see any Shared Collection used within the cluster?
>>>>
>>>> Best Regards
>>>> CVP
>>>>
>>>
>>>
>>
>

Mime
View raw message