beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-793) JdbcIO can create a deadlock when parallelism is greater than 1
Date Wed, 25 Oct 2017 20:38:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219480#comment-16219480
] 

Eugene Kirpichov commented on BEAM-793:
---------------------------------------

This appears to be a MySQL issue where it can hit deadlocks even though there's nothing wrong
with what the application is doing https://bugs.mysql.com/bug.php?id=52020

That said, the MySQL guidance is "just reissue the transaction in case of deadlock" and that's
what JdbcIO should do - roughly as implemented in the last comment by Guillaume. I don't know
whether we should retry indefinitely or up to some limit.

To an earlier point: this code is running on multiple workers in multiple threads and there's
no "lock" you can grab while inserting into the database; even if we could, it probably wouldn't
be a good idea:

- Someone else might be working with the database at the same time, and you might still get
a deadlock
- Many databases are able to handle many clients issuing update statements in parallel quite
well, and in that case giving up parallelism would be giving up performance

JB, why was this moved from 2.2.0 to 2.3.0? That might be valid, but when changing the fix
version of a bug it would be good to accompany that with an explanation as to why the issue
is not important enough.

> JdbcIO can create a deadlock when parallelism is greater than 1
> ---------------------------------------------------------------
>
>                 Key: BEAM-793
>                 URL: https://issues.apache.org/jira/browse/BEAM-793
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
>             Fix For: 2.3.0
>
>
> With the following JdbcIO configuration, if the parallelism is greater than 1, we can
have a {{Deadlock found when trying to get lock; try restarting transaction}}.
> {code}
>         MysqlDataSource dbCfg = new MysqlDataSource();
>         dbCfg.setDatabaseName("db");
>         dbCfg.setUser("user");
>         dbCfg.setPassword("pass");
>         dbCfg.setServerName("localhost");
>         dbCfg.setPortNumber(3306);
>         p.apply(Create.of(data))
>                 .apply(JdbcIO.<Tuple5<Integer, Integer, ByteString, Long, Long>>write()
>                         .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(dbCfg))
>                         .withStatement("INSERT INTO smth(loc,event_type,hash,begin_date,end_date)
VALUES(?, ?, ?, ?, ?) ON DUPLICATE KEY UPDATE event_type=VALUES(event_type),end_date=VALUES(end_date)")
>                         .withPreparedStatementSetter(new JdbcIO.PreparedStatementSetter<Tuple5<Integer,
Integer, ByteString, Long, Long>>() {
>                             public void setParameters(Tuple5<Integer, Integer, ByteString,
Long, Long> element, PreparedStatement statement)
>                                     throws Exception {
>                                 statement.setInt(1, element.f0);
>                                 statement.setInt(2, element.f1);
>                                 statement.setBytes(3, element.f2.toByteArray());
>                                 statement.setLong(4, element.f3);
>                                 statement.setLong(5, element.f4);
>                             }
>                         }));
> {code}
> This can happen due to the {{autocommit}}. I'm going to investigate.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message