flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jingsong Li <jingsongl...@gmail.com>
Subject Re: Not able to implement an usecase
Date Tue, 12 May 2020 08:04:12 GMT
Thanks Roman for involving me.

Hi Jaswin,

FLIP-115[1] will finish Kafka -> Hive/Filesystem. And will be released in
1.11.

We will provide two connectors in table:
- file system connector, this connector manage partitions and files by file
system paths. You can define a file system table with parquet/orc format,
this should be consistent with hive exclude hive metastore support.
- hive connector, this connector manage partitions and files by hive
metastore, support automatic adding partition to hive metastore.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table

Best,
Jingsong Lee

On Tue, May 12, 2020 at 3:52 PM Khachatryan Roman <
khachatryan.roman@gmail.com> wrote:

> AFAIK, yes, you can write streams.
>
> I'm pulling in Jingsong Li and Rui Li as they might know better.
>
> Regards,
> Roman
>
>
> On Mon, May 11, 2020 at 10:21 PM Jaswin Shah <jaswin.shah@outlook.com>
> wrote:
>
>> If I go with table apis, can I write the streams to hive or it is only
>> for batch processing as of now.
>>
>> Get Outlook for Android <https://aka.ms/ghei36>
>>
>> ------------------------------
>> *From:* Khachatryan Roman <khachatryan.roman@gmail.com>
>> *Sent:* Tuesday, May 12, 2020 1:49:10 AM
>> *To:* Jaswin Shah <jaswin.shah@outlook.com>
>> *Cc:* user@flink.apache.org <user@flink.apache.org>
>> *Subject:* Re: Not able to implement an usecase
>>
>> Hi Jaswin,
>>
>> Currently, DataStream API doesn't support outer joins.
>> As a workaround, you can use coGroup function [1].
>>
>> Hive is also not supported by DataStream API though it's supported by
>> Table API [2].
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/functions/CoGroupFunction.html
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/read_write_hive.html
>>
>> Regards,
>> Roman
>>
>>
>> On Mon, May 11, 2020 at 6:03 PM Jaswin Shah <jaswin.shah@outlook.com>
>> wrote:
>>
>> Hi,
>> I want to implement the below use case in my application:
>> I am doing an interval join between two data streams and then, in process
>> function catching up the discrepant results on joining. Joining is done on
>> key orderId. Now, I want to identify all the messages in both datastreams
>> which are not joined. Means, for a message in left stream if I do not
>> find any message in right stream over the interval defined, then, that
>> message should be caught and same for right stream if there are messages
>> which do not have corresponding messages in left streams then, catch
>> them.Need an help how can I achieve the use case. I know this can be
>> done with outer join but interval join or tumbling event time window joins
>> only support inner join as per my knowledge. I do not want to use table/sql
>> api here but want to work on this datastream apis only.
>>
>> Currently I am using this which is working for 90 % of the cases but 10 %
>> of the cases where large large delay can happen and messages in left or
>> right streams are missing are not getting supported with my this
>> implementaions:
>>
>> /**
>>  * Join cart and pg streams on mid and orderId, and the interval specified.
>>  *
>>  * @param leftStream
>>  * @param rightStream
>>  * @return
>>  */
>> public SingleOutputStreamOperator<ResultMessage> intervalJoinCartAndPGStreams(DataStream<CartMessage>
leftStream, DataStream<PGMessage> rightStream, ParameterTool parameter) {
>>     //Descripant results are sent to kafka from CartPGProcessFunction.
>>     return leftStream
>>         .keyBy(new CartJoinColumnsSelector())
>>         .intervalJoin(rightStream.keyBy(new PGJoinColumnsSelector()))
>>         .between(Time.milliseconds(Long.parseLong(parameter.get(Constants.INTERVAL_JOIN_LOWERBOUND))),
Time.milliseconds(Long.parseLong(parameter.get(Constants.INTERVAL_JOIN_UPPERBOUND))))
>>         .process(new CartPGProcessFunction());
>>
>> }
>>
>>
>>
>> Secondly, I am unable to find the streaming support to stream out the
>> datastreams I am reading from kafka to hive which I want to batch process
>> with Flink
>>
>> Please help me on resolving this use cases.
>>
>> Thanks,
>> Jaswin
>>
>>
>> Get Outlook for Android <https://aka.ms/ghei36>
>>
>>

-- 
Best, Jingsong Lee

Mime
View raw message