flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shimin Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10245) Add DataStream HBase Sink
Date Mon, 10 Sep 2018 02:30:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608642#comment-16608642

Shimin Yang commented on FLINK-10245:

Hi [~hequn8128],

For the comments you mentioned last time, I looked into the HBase client implementation and
think that I can add a scheduler to flush the data periodically by the time set by user.

I am not very sure about should I replace the api with Hbase batch api since it already provided
buffer and flush functionality. 

And if I stick with this api, I think it's hard to deduplicate data using rowkey as it is
buffered in the BufferedMutator in HBase client and there's no deletion of Mutator function

What do you think?



> Add DataStream HBase Sink
> -------------------------
>                 Key: FLINK-10245
>                 URL: https://issues.apache.org/jira/browse/FLINK-10245
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Streaming Connectors
>            Reporter: Shimin Yang
>            Assignee: Shimin Yang
>            Priority: Major
>              Labels: pull-request-available
> Design documentation: [https://docs.google.com/document/d/1of0cYd73CtKGPt-UL3WVFTTBsVEre-TNRzoAt5u2PdQ/edit?usp=sharing]

This message was sent by Atlassian JIRA

View raw message