beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Halperin <dhalp...@google.com>
Subject Re: KafkaIO Questions
Date Wed, 22 Jun 2016 16:25:26 GMT
Thanks, Jesse!

On Wed, Jun 22, 2016 at 9:10 AM, Jesse Anderson <jesse@smokinghand.com>
wrote:

> Confirmed that KafkaIO is consuming and producing as of
> f4809446b931c02e1dc5da0d86f01faf00b53581.
>
> On Fri, Jun 10, 2016 at 6:40 PM Raghu Angadi <rangadi@google.com> wrote:
>
>> KafkaIO reader reports Long.MIN_VALUE
>> <https://github.com/apache/incubator-beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1074>
>> for watermark in this case (it hasn't read any records yet). That is fine,
>> right?
>>
>> On Fri, Jun 10, 2016 at 5:46 PM, Thomas Groh <tgroh@google.com> wrote:
>>
>>> If we're reading from an unbounded read and it reports the watermark as
>>> BoundedWindow#TIMESTAMP_MAX_VALUE, the InProcessRunner won't reinvoke the
>>> source; the call to start() returning false by itself just means that we
>>> should call into it later, but the output watermark should still be held by
>>> the source.
>>>
>>> On Fri, Jun 10, 2016 at 4:44 PM, Raghu Angadi <rangadi@google.com>
>>> wrote:
>>>
>>>> It looks like InProcessPipelineRunner instantiates the source, calls
>>>> start() on it, and immediately closes it. In this case start() returns
>>>> false and the runner seems to think the source is done (which is incorrect?)
>>>>
>>>> On Fri, Jun 10, 2016 at 4:24 PM, Jesse Anderson <jesse@smokinghand.com>
>>>> wrote:
>>>>
>>>>> Raghu and I spent some time on a hangout looking at this issue. Looks
>>>>> like there is an issue with unbounded collections with KafkaIO
>>>>> on InProcessPipelineRunner.
>>>>>
>>>>> We changed the code to be a bounded collection with
>>>>> withMaxNumRecords and used DirectPipelineRunner. That worked and processed
>>>>> the messages.
>>>>>
>>>>> Next, we used InProcessPipelineRunner with a bounded collection. That
>>>>> worked and processed the messages.
>>>>>
>>>>> We changed it back to an unbounded collection
>>>>> using InProcessPipelineRunner. That didn't work and continued to output
the
>>>>> error messages similar to the ones I've shown on the thread.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jesse
>>>>>
>>>>>
>>>>> On Wed, Jun 8, 2016 at 7:12 PM Jesse Anderson <jesse@smokinghand.com>
>>>>> wrote:
>>>>>
>>>>>> I tried an 0.9.0 broker and I got the same error. Not sure if it
>>>>>> makes a difference, but I'm using Confluent platform 2.0 and 3.0
for this
>>>>>> testing.
>>>>>>
>>>>>> On Wed, Jun 8, 2016 at 5:20 PM Jesse Anderson <jesse@smokinghand.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Still open to screensharing and resolving over a hangout.
>>>>>>>
>>>>>>> On Wed, Jun 8, 2016 at 5:19 PM Raghu Angadi <rangadi@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Wed, Jun 8, 2016 at 1:56 PM, Jesse Anderson <
>>>>>>>> jesse@smokinghand.com> wrote:
>>>>>>>>
>>>>>>>>> [pool-2-thread-1] INFO org.apache.beam.sdk.io.kafka.KafkaIO
-
>>>>>>>>> Reader-0: resuming eventsim-0 at default offset
>>>>>>>>>
>>>>>>>> [...]
>>>>>>>>>
>>>>>>>> [pool-2-thread-1] INFO org.apache.kafka.common.utils.AppInfoParser
>>>>>>>>> - Kafka commitId : 23c69d62a0cabf06
>>>>>>>>> [pool-2-thread-1] WARN org.apache.beam.sdk.io.kafka.KafkaIO
-
>>>>>>>>> Reader-0: getWatermark() : no records have been read
yet.
>>>>>>>>> [pool-77-thread-1] INFO org.apache.beam.sdk.io.kafka.KafkaIO
-
>>>>>>>>> Reader-0: Returning from consumer pool loop
>>>>>>>>> [pool-78-thread-1] WARN org.apache.beam.sdk.io.kafka.KafkaIO
-
>>>>>>>>> Reader-0: exception while fetching latest offsets. ignored.
>>>>>>>>>
>>>>>>>>
>>>>>>>> this reader is closed before the exception. The exception
is due to
>>>>>>>> an action during close and can be ignored. The main question
is why this
>>>>>>>> was closed...
>>>>>>>>
>>>>>>>
>>>>
>>>
>>

Mime
View raw message