trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan M. Carroll" <>
Subject Re: [VOTE] Release Apache Traffic Server 4.2.1 (RC0)
Date Wed, 16 Apr 2014 15:18:03 GMT
I was asked for a translation of my previous email, bonging the 4.2.1 RC0.

The problem in 4.2.0 was a shift in the set of WKS values. These are not just live data but
also written to the cache in the object headers so if they change at all, it de facto invalidates
the cache. The 4.2.0 crashes (TS-2564) are due to this, because various secondary bits of
data get written inconsistently which in turns causes ATS to look up the wrong data for header
fields. For instance, the VARY field would be written out along with a hint about where it
was in the header. When read back in 4.2.0 ATS would use the stored WKS index to lookup the
hint location and get the wrong location (because VARY had shifted) and use that to find the
wrong data for VARY (possibly null or unallocated memory).

To fix this, 4.2.1 simply clears all the hints and rewrites them when the object is read from
disk if using a cache version earlier than 4.2.1. This ignores the stored values and uses
only the current in memory values.

However, it turns out that when the object is read from disk, it may be stored in the ram
cache. If retrieved from ram cache later, it goes through the same logic as if it had been
loaded from disk, which includes clearing and rewriting the hints. The ATS logic, though,
doesn't lock the object for this because it is expected to be read only once read from the
disk. The TS-2564 logic violates this and thereby creates a race condition between two transaction
both access the same object. It is possible for one to check the valid hints for a field and
then, while it is trying to retrieve the field, the other transaction can clear the hints
causing the field to not be found. This leads to a crash because the logic assumes (reasonably)
that if it's checked the hints and verified the field presence, the field is present and will
be found. If the field is not found, you get a null pointer dereference.

The solution is to prevent the 4.2.0 fixup from being done on objects retrieved from the ram
cache. There's no need as the fixup was done when it was read from disk and put in the ram
cache. There is no race condition for disk reads because those are not shared until after
the fixup.

View raw message