beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Kirpichov <kirpic...@google.com>
Subject Re: Reducing database connection with JdbcIO
Date Wed, 14 Mar 2018 19:29:22 GMT
I wonder if this is https://issues.apache.org/jira/browse/BEAM-3245 , but
in that JIRA it is unclear exactly in what cases it is broken and how
much. +Thomas
Groh <tgroh@google.com> +Kenn Knowles <klk@google.com> Dataflow Runner
normally doesn't leak DoFn's if their ProcessElement threw an exception,
right?

On Wed, Mar 14, 2018 at 12:24 PM Romain Manni-Bucau <rmannibucau@gmail.com>
wrote:

> side note: try to do a thread dump on the workers maybe
>
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>
> 2018-03-14 20:21 GMT+01:00 Eugene Kirpichov <kirpichov@google.com>:
>
>> "Jdbcio will create for each prepared statement new connection" - this is
>> not the case: the connection is created in @Setup and deleted in @Teardown.
>>
>> https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L503
>>
>> https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L631
>>
>> Something else must be going wrong.
>>
>> On Wed, Mar 14, 2018 at 12:11 PM Aleksandr <aleksandr.vl@gmail.com>
>> wrote:
>>
>>> Hello, we had similar problem. Current jdbcio will cause alot of
>>> connection errors.
>>>
>>> Typically you have more than one prepared statement. Jdbcio will create
>>> for each prepared statement new connection(and close only in teardown) So
>>> it is possible that connection will get timeot or in case in case of auto
>>> scaling you will get to many connections to sql.
>>> Our solution was to create connection pool in setup and get connection
>>> and return back to pool in processElement.
>>>
>>> Best Regards,
>>> Aleksandr Gortujev.
>>>
>>> 14. märts 2018 8:52 PM kirjutas kuupäeval "Jean-Baptiste Onofré" <
>>> jb@nanthrax.net>:
>>>
>>> Agree especially using the current JdbcIO impl that creates connection
>>> in the @Setup. Or it means that @Teardown is never called ?
>>>
>>> Regards
>>> JB
>>> Le 14 mars 2018, à 11:40, Eugene Kirpichov <kirpichov@google.com> a
>>> écrit:
>>>>
>>>> Hi Derek - could you explain where does the "3000 connections" number
>>>> come from, i.e. how did you measure it? It's weird that 5-6 workers would
>>>> use 3000 connections.
>>>>
>>>> On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <derekcsy@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We are new to Beam and need some help.
>>>>>
>>>>> We are working on a flow to ingest events and writes the aggregated
>>>>> counts to a database. The input rate is rather low (~2000 message per
>>>>> sec), but the processing is relatively heavy, that we need to scale out
>>>>> to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is
>>>>> also
>>>>> low. But because of the number of workers, it keeps 3000 connections
to
>>>>> the database and it keeps hitting the database connection limits.
>>>>>
>>>>> Is there a way that we can reduce the concurrency only at the output
>>>>> stage? (In Spark we would have done a repartition/coalesce).
>>>>>
>>>>> And, if it matters, we are using Apache Beam 2.2 via Scio, on Google
>>>>> Dataflow.
>>>>>
>>>>> Thank you in advance!
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>

Mime
View raw message