apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Immaneni <pra...@datatorrent.com>
Subject Re: At-least once semantics for Kafka to Cassndra ingest
Date Wed, 15 Feb 2017 09:30:04 GMT
That is the default behavior Himanshu. The Kafka operator will re-read
from the older checkpointed offset if there is a failure in the Kafka
operator. If there is a failure in any of the downstream operators
including Cassandra, they will read the previous messages from the
buffer present in the upstream operator, so in effect you won't lose
messages.

Thanks

> On Feb 15, 2017, at 10:34 AM, Himanshu Bari <himanshubari@gmail.com> wrote:
>
> Hey guys
>
> We are looking for at least once semantics and not exactly once. Sinxe the
> sink is Cassandra it is ok if tge same record is written twice..iy will
> just overwrite on any reprocessing...
>
> Himanshu
>
>> On Feb 14, 2017 9:01 PM, "Priyanka Gugale" <priyanka@datatorrent.com> wrote:
>>
>> For this particular case kafka -> cassandra, you need not worry about
>> partial windows. Cassandra output operator does batch processing i.e. all
>> records received in a window will be written at end window. So IMO, if you
>> set exactly once processing on Kafka Input operator, and choose
>> transactional cassandra output operator you will achieve exactly once
>> processing. If you have other operators in your dag you might want to make
>> sure they are idempotent (please check blog shared by Sandesh for
>> reference).
>>
>> -Priyanka
>>
>> On Wed, Feb 15, 2017 at 4:06 AM, Sandesh Hegde <sandesh@datatorrent.com>
>> wrote:
>>
>>> Settings mentioned by Sanjay, will only guarantee exactly once for
>> Windows,
>>> but not for partial window processed by the operator, in a way that
>> setting
>>> is a misnomer.
>>> To achieve Exactly once, there are some precoditions that need to be met
>>> along with the support in the output operator. Here is a blog that gives
>>> the idea about exactly once,
>>> https://www.datatorrent.com/blog/end-to-end-exactly-once-
>> with-apache-apex/
>>>
>>> On Tue, Feb 14, 2017 at 2:11 PM Sanjay Pujare <sanjay@datatorrent.com>
>>> wrote:
>>>
>>>> Have you taken a look at
>>>> http://apex.apache.org/docs/apex/application_development/#exactly-once
>> ?
>>>> i.e. setting that processing mode on all the operators in the pipeline
>> .
>>>>
>>>> Join us at Apex Big Data World-San Jose <
>>>> http://www.apexbigdata.com/san-jose.html>, April 4, 2017!
>>>>
>>>> http://www.apexbigdata.com/san-jose-register.html
>>>>
>>>>
>>>> On 2/14/17, 12:00 PM, "Himanshu Bari" <himanshubari@gmail.com> wrote:
>>>>
>>>>    How to ensure that the Kafka to Cassandra ingestion pipeline in
>> Apex
>>>> will
>>>>    guarantee exactly once processing semantics.
>>>>    Eg. Message was read from Kafka but apex app died before it was
>>> written
>>>>    successfully to Cassandra.
>>>>
>>>>
>>>>
>>>> --
>>> *Join us at Apex Big Data World-San Jose
>>> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
>>> [image: http://www.apexbigdata.com/san-jose-register.html]
>>> <http://www.apexbigdata.com/san-jose-register.html>
>>>
>>

Mime
View raw message