samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <jagadish1...@gmail.com>
Subject Re: Review Request 45258: Abandon producer retry after a certain # of errors : SAMZA-911
Date Thu, 14 Apr 2016 23:39:04 GMT


> On April 14, 2016, 8:43 p.m., Chris Pettitt wrote:
> > samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala,
line 71
> > <https://reviews.apache.org/r/45258/diff/1/?file=1312769#file1312769line71>
> >
> >     This is not directly related to your commit, but:
> >     
> >     This function is not reentrant. Is that expected? For example, it would be possible
for a the Kafka callback to set sendFailed to true but a subsequent send would reset this
flag.
> >     
> >     
> >     ---
> >     
> >     Separately, there is an implicit assumption that retryBackoff is providing a
happens-before constraint between an invocation of the exception handler in retryBackoff and
a subsequent invocation of the loop. This likely always holds, but a CAS for numRetries would
cover cases where that does not hold.
> 
> Jagadish Venkatraman wrote:
>     Yes, that's true. 
>     
>     1.It's entirely possible for a Kafka callback to set sendFailed to true, and a subsequent
send to reset this flag. Ideally, errors in the callbacks should shut-down the producer in
the same callback thread.
>     2. It's also possible for a subsequent send (for msg2) to be processed even if the
send for the previous message (msg1) fails.
>     
>     This is the race in the producer that Navina mentioned in her review of this RB.
(and requested that a separate jira be created to track). SAMZA-934 tracks this.
> 
> Chris Pettitt wrote:
>     Yes, my list was not exhaustive :). Fine to track the first issue via JIRA. The second
issue is introduced in this RB, tho.

I think the only thing that may fail the happens before constraint is: Part of the exception
handler is updated in one thread, and the rest is updated in a separate thread. (we don't
have this case now). I'll change it to an AtomicInteger instead of an integer.


- Jagadish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45258/#review128972
-----------------------------------------------------------


On April 14, 2016, 10:05 p.m., Jagadish Venkatraman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45258/
> -----------------------------------------------------------
> 
> (Updated April 14, 2016, 10:05 p.m.)
> 
> 
> Review request for samza, Jake Maes, Navina Ramesh, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-911
>     https://issues.apache.org/jira/browse/SAMZA-911
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Currently, the KafkaSystemProducer's producer loop keeps retrying indefinitely when there
is an exception in the retryBackOff loop. If there are repeated exceptions, then it makes
sense to retry for awhile, and then fail the container.
> 
> Long term, we should focus on getting rid off the retryBackOff loop, and close the producer
object in the callback during failure. This will guarantee in-order delivery.
> 
> 1.Modified the KafkaSystemProducer to take a maxRetries. (currently, its set to 30).
> 2.Add tests to verify retry in case of RetriableExceptions.
> 
> 
> Diffs
> -----
> 
>   samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala
9a44d46d29a1997958a9d2bbf7be0bde860fff64 
>   samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemProducer.scala
39426d8cf64516ec4fdc0cb4ff60b1df3a757470 
> 
> Diff: https://reviews.apache.org/r/45258/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests to verify functionality.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message