geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Howe <kh...@pivotal.io>
Subject Re: Review Request 56506: GEODE-2398: Retry oplog channel.write on silent failures
Date Sat, 11 Feb 2017 00:06:21 GMT


> On Feb. 10, 2017, 10:02 p.m., Darrel Schneider wrote:
> > geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java, line 5213
> > <https://reviews.apache.org/r/56506/diff/1/?file=1628630#file1628630line5213>
> >
> >     What we have seen so far is that when something is wrong it has always been
the position in "bb".
> >     This code basically says that if the result of "channel.write" is not consistent
with the "bb" position then use the "channel" position as the final authority on what is correct.
> >     It is possible that the "bb" position is correct according to the "channel"
position. So when you fix the "bb" position to be consistent with the channel, I think you
should also fix the result of "channel.write" (stored in "channelBytesWritten"). In the retry
block you can just update both of these. Then move the inc of "flushed" to be done after this
block (right before "while bb.hasRemaining").

Good catch. Original fix accounted for the number of bytes actually written when resetting
the ByteBuffer position for retrying, but if the change in channel position was > 0 and
didn't match write() return value, then the number of bytes flushed wasn't updated correctly.


> On Feb. 10, 2017, 10:02 p.m., Darrel Schneider wrote:
> > geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java, line 5223
> > <https://reviews.apache.org/r/56506/diff/1/?file=1628630#file1628630line5223>
> >
> >     It would be nice to include more info in this IOException related to the inconsistency
in channleBytesWritten vs. bytes read from bb.

The IOException will now show siomething like:

Failed to write Oplog entry toBACKUPtestRegion_1.crf: channel.write() returned 0, change in
channel position = 0, change in source buffer position = 20


- Ken


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56506/#review164991
-----------------------------------------------------------


On Feb. 11, 2017, midnight, Ken Howe wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56506/
> -----------------------------------------------------------
> 
> (Updated Feb. 11, 2017, midnight)
> 
> 
> Review request for geode, anilkumar gingade and Darrel Schneider.
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> Implemented limited retries in two forms of Oplog.flush() when channel.write() is called.
> If write() returns bytes witten less than the change in the ByteBuffer positions, then
reset
> buffer positions and re-try writing for a limited number of times. Throws
> IOException if the write doesn't succeeded after a few retries (max
> number of retries is defined by a static).
> 
> Added new unit tests.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java 0b98364 
>   geode-core/src/test/java/org/apache/geode/internal/cache/OplogFlushTest.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/56506/diff/
> 
> 
> Testing
> -------
> 
> Started precheckin
> 
> 
> Thanks,
> 
> Ken Howe
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message