flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "PJ Van Aeken (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-2055) Implement Streaming HBaseSink
Date Mon, 01 Feb 2016 14:11:39 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126231#comment-15126231
] 

PJ Van Aeken edited comment on FLINK-2055 at 2/1/16 2:11 PM:
-------------------------------------------------------------

Indeed the example that you described uses the native client API which I think is the way
to go. Unfortunately, HTable is now deprecated so the examples are outdated. In the link to
the mailing list (see the issue description), it is suggested to now use the write method
on DataStream combined with TableOutputFormat.

https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/datastream/DataStream.html#write%28org.apache.flink.api.common.io.OutputFormat,%20long%29

What I am proposing instead is to make a SinkFunction (like we have for Flume for instance)
that uses the new HBase client API's, similar to how the example you referred to used to work,
rather than using this TableOutputFormat which as far as I understand buffers requests on
the client side based on some internal heuristics, as per the HBase documentation:

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/BufferedMutator.html

EDIT: There appears to be a version mismatch which is why we are not seeing the same problems.
Turns out my assumptions are not true in version 0.98x, I am unsure about 1.x for now and
its definitely true for 2.x which is in snapshot currently. So the inner workings of the TableOutputFormat
have changed in recent versions, which introduces the problem I have described.


was (Author: vanaepi):
Indeed the example that you described uses the native client API which I think is the way
to go. Unfortunately, HTable is now deprecated so the examples are outdated. In the link to
the mailing list (see the issue description), it is suggested to now use the write method
on DataStream combined with TableOutputFormat.

https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/datastream/DataStream.html#write%28org.apache.flink.api.common.io.OutputFormat,%20long%29

What I am proposing instead is to make a SinkFunction (like we have for Flume for instance)
that uses the new HBase client API's, similar to how the example you referred to used to work,
rather than using this TableOutputFormat which as far as I understand buffers requests on
the client side based on some internal heuristics, as per the HBase documentation:

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/BufferedMutator.html

> Implement Streaming HBaseSink
> -----------------------------
>
>                 Key: FLINK-2055
>                 URL: https://issues.apache.org/jira/browse/FLINK-2055
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming, Streaming Connectors
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Assignee: Hilmi Yildirim
>
> As per : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message