apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Bari <himanshub...@gmail.com>
Subject Re: At-least once semantics for Kafka to Cassndra ingest
Date Wed, 15 Feb 2017 09:39:30 GMT
Great. Thats what I thought. But, I suspect we might be seeing otherwise in
our app. Is there anything that needs to be done wrt to the offset
management in the kafka reader app to ensure this works correctly? Eg Does
the developer need to decide when the kafka reader app moves the Kafka
topic offset forward to acknowledge that all the messages read to that
point were successfully written into Cassandra?

On Feb 15, 2017 1:32 AM, "Pramod Immaneni" <pramod@datatorrent.com> wrote:

> That is the default behavior Himanshu. The Kafka operator will re-read
> from the older checkpointed offset if there is a failure in the Kafka
> operator. If there is a failure in any of the downstream operators
> including Cassandra, they will read the previous messages from the
> buffer present in the upstream operator, so in effect you won't lose
> messages.
>
> Thanks
>
> > On Feb 15, 2017, at 10:34 AM, Himanshu Bari <himanshubari@gmail.com>
> wrote:
> >
> > Hey guys
> >
> > We are looking for at least once semantics and not exactly once. Sinxe
> the
> > sink is Cassandra it is ok if tge same record is written twice..iy will
> > just overwrite on any reprocessing...
> >
> > Himanshu
> >
> >> On Feb 14, 2017 9:01 PM, "Priyanka Gugale" <priyanka@datatorrent.com>
> wrote:
> >>
> >> For this particular case kafka -> cassandra, you need not worry about
> >> partial windows. Cassandra output operator does batch processing i.e.
> all
> >> records received in a window will be written at end window. So IMO, if
> you
> >> set exactly once processing on Kafka Input operator, and choose
> >> transactional cassandra output operator you will achieve exactly once
> >> processing. If you have other operators in your dag you might want to
> make
> >> sure they are idempotent (please check blog shared by Sandesh for
> >> reference).
> >>
> >> -Priyanka
> >>
> >> On Wed, Feb 15, 2017 at 4:06 AM, Sandesh Hegde <sandesh@datatorrent.com
> >
> >> wrote:
> >>
> >>> Settings mentioned by Sanjay, will only guarantee exactly once for
> >> Windows,
> >>> but not for partial window processed by the operator, in a way that
> >> setting
> >>> is a misnomer.
> >>> To achieve Exactly once, there are some precoditions that need to be
> met
> >>> along with the support in the output operator. Here is a blog that
> gives
> >>> the idea about exactly once,
> >>> https://www.datatorrent.com/blog/end-to-end-exactly-once-
> >> with-apache-apex/
> >>>
> >>> On Tue, Feb 14, 2017 at 2:11 PM Sanjay Pujare <sanjay@datatorrent.com>
> >>> wrote:
> >>>
> >>>> Have you taken a look at
> >>>> http://apex.apache.org/docs/apex/application_development/#
> exactly-once
> >> ?
> >>>> i.e. setting that processing mode on all the operators in the pipeline
> >> .
> >>>>
> >>>> Join us at Apex Big Data World-San Jose <
> >>>> http://www.apexbigdata.com/san-jose.html>, April 4, 2017!
> >>>>
> >>>> http://www.apexbigdata.com/san-jose-register.html
> >>>>
> >>>>
> >>>> On 2/14/17, 12:00 PM, "Himanshu Bari" <himanshubari@gmail.com>
wrote:
> >>>>
> >>>>    How to ensure that the Kafka to Cassandra ingestion pipeline in
> >> Apex
> >>>> will
> >>>>    guarantee exactly once processing semantics.
> >>>>    Eg. Message was read from Kafka but apex app died before it was
> >>> written
> >>>>    successfully to Cassandra.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>> *Join us at Apex Big Data World-San Jose
> >>> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> >>> [image: http://www.apexbigdata.com/san-jose-register.html]
> >>> <http://www.apexbigdata.com/san-jose-register.html>
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message