beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: Reducing database connection with JdbcIO
Date Wed, 14 Mar 2018 15:00:40 GMT
Hi Derek,

I think you could be interested by:

https://github.com/apache/beam/pull/4461

related to BEAM-3500.

I introduced an internal poolable datasource.

I hope it could help.

Regards
JB

On 14/03/2018 11:49, Derek Chan wrote:
> Hi,
> 
> We are new to Beam and need some help.
> 
> We are working on a flow to ingest events and writes the aggregated 
> counts to a database. The input rate is rather low (~2000 message per 
> sec), but the processing is relatively heavy, that we need to scale out 
> to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also 
> low. But because of the number of workers, it keeps 3000 connections to 
> the database and it keeps hitting the database connection limits.
> 
> Is there a way that we can reduce the concurrency only at the output 
> stage? (In Spark we would have done a repartition/coalesce).
> 
> And, if it matters, we are using Apache Beam 2.2 via Scio, on Google 
> Dataflow.
> 
> Thank you in advance!
> 
> 
> 

Mime
View raw message