beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Solomon Duskis (JIRA)" <>
Subject [jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector
Date Thu, 26 Oct 2017 00:29:00 GMT


Solomon Duskis commented on BEAM-2955:

The problem is that Cloud Bigtable needs the following things:

# A different method for splitting.
# A different configuration mechanism for Cloud Bigtable specific configuration.  The configuration
mechanism would also require the use of ValueProvider for templating purposes.
# A custom Cloud Bigtable oriented metric for expressing throttling.
# A custom way to use MultiRowRangeFilter (which is different between Cloud Bigtable and HBase)

There are probably other differences I'm missing.

A Service works for issue #1, but not for the rest.  There definitely is room for reuse, but
I'm not sure if passing a Service to HBaseIO is the right way to do it.

> Create a Cloud Bigtable HBase connector
> ---------------------------------------
>                 Key: BEAM-2955
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-gcp
>            Reporter: Solomon Duskis
>            Assignee: Solomon Duskis
> The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a different
repo for awhile. Recently, we did some reworking of the Cloud Bigtable client that would allow
it to better coexist in the Beam ecosystem, and we also released a Beam connector in our repository
that exposes HBase idioms rather than the Protobuf idioms of BigtableIO.  More information
about the customer experience of the HBase connector can be found here: [].
> The Beam repo is a much better place to house a Cloud Bigtable HBase connector.  There
are a couple of ways we can implement this new connector:
> # The CBT connector depends on artifacts in the io/hbase maven project.  We can create
a new extend HBaseIO for the purposes of CBT.  We would have to add some features to HBaseIO
to make that work (dynamic rebalancing, and a way for HBase and CBT's size estimation models
to coexist)
> # The BigtableIO connector works well, and we can add an adapter layer on top of it.
 I have a proof of concept of it here: [].
> # We can build a separate CBT HBase connector.
> I'm happy to do the work.  I would appreciate some guidance and discussion about the
right approach.

This message was sent by Atlassian JIRA

View raw message