flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer W <sam...@axiomine.com>
Subject Re: Guys, How to use JDBC connection in Flink using Scala
Date Sat, 24 Sep 2016 16:55:14 GMT
Using plain JDBC on Redshift will be slow for any reasonable volume but if
you need to do that, you can open a connection to it from a RichFunction
open() method-

I wrote a blog article a while back on Spark-Redshift package works -
https://databricks.com/blog/2015/10/19/introducing-redshift-data-source-for-spark.html
.

This image captures the internal processes in Spark-Redshift for read -
https://databricks.com/wp-content/uploads/2015/10/image01.gif. and this one
captures the write -
https://databricks.com/wp-content/uploads/2015/10/image00.gif

In your case you can read the Kafka sources, partition the data
appropriately (based on Redshift) and write the partitions to an S3 bucket
and then invoke the COPY command in Redshift to load the data from the S3
bucket. This is the exact same process written explicitly as the above
mentioned blog article.

Sameer




On Sat, Sep 24, 2016 at 12:32 PM, ram kumar <ramkumar09951@gmail.com> wrote:

> Many Thanks Felix.
>
> * Flink Use case :*
>
>
>
> Extract data from source *(Kafka*) and loading data into target (*AWS S3
> and Redshift)*.
>
> we use SCD2 in the Redshift…since data changes need to be captured in the
> redshift target.
>
>
>
> To connect redshift ( for staging and production database ) I need to
> setup JDBC connection in Flink Scala.
>
>
>
> *Kafka (Source) ------------>  Flink  (JDBC)  ----------->** AWS ( S3 and
> Redshift) Target.*
>
>
>
> Could you please suggest me the best approach for this use case.
>
>
> Regards
>
> Ram.
>
>
>
> On 24 September 2016 at 16:14, Felix Dreissig <f30@f30.me> wrote:
>
>> Hi Ram,
>>
>> On 24 Sep 2016, at 16:08, ram kumar <ramkumar09951@gmail.com> wrote:
>> > I am wondering is that possible to add JDBC connection or url  as a
>> source or target in Flink using Scala.
>> > Could you kindly some one help me on this? if you have any sample code
>> please share it here.
>>
>> What’s your intended use case? Getting changes from a database or REST
>> API into a data stream for processing in Flink?
>>
>> If so, you could use a data capture tool to write your changes to Kafka
>> and then let Flink receive them from there. There are e.g. Bottled Water
>> [1] for Postgres and Maxwell [2] and Debezium [3] for MySQL.
>> For REST, I suppose you’d have to periodically query the API and
>> determine changes yourself. I don’t know if there are any tools to help you
>> with that.
>>
>> Regards,
>> Felix
>>
>> [1] https://github.com/confluentinc/bottledwater-pg
>> [2] http://maxwells-daemon.io/
>> [3] http://debezium.io/
>
>
>

Mime
View raw message