beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré (JIRA) <j...@apache.org>
Subject [jira] [Assigned] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set
Date Tue, 01 May 2018 19:35:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Baptiste Onofré reassigned BEAM-3714:
------------------------------------------

    Assignee: Innocent  (was: Jean-Baptiste Onofré)

> JdbcIO.read() should create a forward-only, read-only result set
> ----------------------------------------------------------------
>
>                 Key: BEAM-3714
>                 URL: https://issues.apache.org/jira/browse/BEAM-3714
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-jdbc
>            Reporter: Eugene Kirpichov
>            Assignee: Innocent
>            Priority: Major
>             Fix For: 2.5.0
>
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934] -
a user is trying to load a large table from MySQL, and the MySQL JDBC driver requires special
measures when loading large result sets.
> JdbcIO currently calls simply "connection.prepareStatement(query)" https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508
- it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY - these values
should always be used.
> Seems that different databases have different requirements for streaming result sets.
> E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must not be in
autocommit mode." https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
. Oracle, I think, doesn't have any special requirements but I don't know. Fetch size should
probably still be set to a reasonably large value.
> Seems that the common denominator of these requirements is: set fetch size to a reasonably
large but not maximum value; disable autocommit (there's nothing to commit in read() anyway).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message