beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reuven Lax (JIRA)" <>
Subject [jira] [Commented] (BEAM-1831) Checking of containment in createdTables may have race condition in StreamingWriteFn
Date Fri, 28 Apr 2017 16:52:04 GMT


Reuven Lax commented on BEAM-1831:

Some more details - createdTables is ConcurrentHashMap, and is therefore safe for multithreaded
access. The reason for the second call to contains under the lock is _not_ for correctness,
it's to prevent a huge number of calls to createTable, which overwhelm BigQuery quota. Spurious
calls to createTable is fine from a correctness standpoint - BigQuery will simply fail subsequent
calls. The first check is there to avoid lock contention in the common case, which empirically
was a problem.

While this patterns _looks_ like double-checked locking, it in fact isn't. 

> Checking of containment in createdTables may have race condition in StreamingWriteFn
> ------------------------------------------------------------------------------------
>                 Key: BEAM-1831
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>            Reporter: Ted Yu
>            Assignee: Reuven Lax
>            Priority: Minor
> {code}
>   public TableReference getOrCreateTable(BigQueryOptions options, String tableSpec)
>       throws InterruptedException, IOException {
>     TableReference tableReference = BigQueryHelpers.parseTableSpec(tableSpec);
>     if (createDisposition != createDisposition.CREATE_NEVER
>         && !createdTables.contains(tableSpec)) {
>       synchronized (createdTables) {
>         // Another thread may have succeeded in creating the table in the meanwhile,
>         // check again. This check isn't needed for correctness, but we add it to prevent
>         // every thread from attempting a create and overwhelming our BigQuery quota.
>         DatasetService datasetService = bqServices.getDatasetService(options);
>         if (!createdTables.contains(tableSpec)) {
> {code}
> The first createdTables.contains() check is outside synchronized block.
> At least createdTables should be declared volatile for the double checked locking to

This message was sent by Atlassian JIRA

View raw message