beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "peay (JIRA)" <>
Subject [jira] [Commented] (BEAM-1751) Singleton ByteKeyRange with BigtableIO and Dataflow runner
Date Wed, 22 Mar 2017 23:04:41 GMT


peay commented on BEAM-1751:

I can reproduce with:

public static void main(String[] args) throws URISyntaxException, IOException {
    MyOptions options = PipelineOptionsFactory
    Pipeline pipeline = Pipeline.create(options);

    BigtableOptions bigtableOptions =
        new BigtableOptions.Builder()

    PCollection<Row> properties = pipeline

My original code uses the output of BigtableIO as a side input to an unbounded collection
from KafkaIO, but I can reproduce it with this simple program that only has a bounded BigtableIO.

For the runner, I am using {{--maxNumWorkers=1 --streaming}} and that is essentially it.

> Singleton ByteKeyRange with BigtableIO and Dataflow runner
> ----------------------------------------------------------
>                 Key: BEAM-1751
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, sdk-java-gcp
>    Affects Versions: 0.5.0
>            Reporter: peay
>            Assignee: Daniel Halperin
> I am getting this exception on a smallish table of a couple hundreds of rows from Bigtable,
when running on Dataflow with a single worker.
> This doesn't occur with the direct runner on my laptop, only when running on Dataflow.
Backtrace is from Beam 0.5.
> {code}java.lang.IllegalArgumentException: Start [xxxxxxxxxx] must be less than end [xxxxxxxxxx]
> 	at
> 	at<init>(
> 	at
> 	at$BigtableSource.withEndKey(
> 	at$BigtableReader.splitAtFraction(
> 	at$BigtableReader.splitAtFraction(
> 	at org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.getCheckpointMark(
> 	at org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(
> 	at org.apache.beam.runners.dataflow.DataflowUnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.getCheckpointMark(
> 	at
> 	at
> 	at$700(
> 	at$
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(
> 	at java.util.concurrent.ThreadPoolExecutor$
> 	at
> {code}
> This is in the log right before:
> {code}
> "Proposing to split ByteKeyRangeTracker{range=ByteKeyRange{startKey=[xxxxxxxxxx], endKey=[]},
position=null} at fraction 0.0 (key [xxxxxxxxxx])"   
> {code}
> I have replaced the actual key with {{xxxxxxxxxx}}, but it is always the same everywhere.
the end position is obtained by truncating the fractional part of {{size * fraction}}, such
that the resulting offset can just be zero if {{fraction}} is too small. `ByteKeyRange` does
not allow a singleton range, however. Since {{fraction}} is zero here, the call to {{splitAtFraction}}

This message was sent by Atlassian JIRA

View raw message