Return-Path: X-Original-To: apmail-trafficserver-users-archive@www.apache.org Delivered-To: apmail-trafficserver-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F0B1D11991 for ; Wed, 16 Apr 2014 16:34:40 +0000 (UTC) Received: (qmail 75120 invoked by uid 500); 16 Apr 2014 16:34:36 -0000 Delivered-To: apmail-trafficserver-users-archive@trafficserver.apache.org Received: (qmail 75054 invoked by uid 500); 16 Apr 2014 16:34:36 -0000 Mailing-List: contact users-help@trafficserver.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@trafficserver.apache.org Delivered-To: mailing list users@trafficserver.apache.org Received: (qmail 75039 invoked by uid 99); 16 Apr 2014 16:34:36 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 16:34:36 +0000 Received: from localhost (HELO mail-wg0-f50.google.com) (127.0.0.1) (smtp-auth username sorber, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 16:34:36 +0000 Received: by mail-wg0-f50.google.com with SMTP id x13so11030139wgg.21 for ; Wed, 16 Apr 2014 09:34:34 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.180.160.205 with SMTP id xm13mr750288wib.22.1397666074542; Wed, 16 Apr 2014 09:34:34 -0700 (PDT) Received: by 10.216.176.2 with HTTP; Wed, 16 Apr 2014 09:34:34 -0700 (PDT) In-Reply-To: <1115499933.20140416101803@network-geographics.com> References: <558096448.20140415155441@network-geographics.com> <1115499933.20140416101803@network-geographics.com> Date: Wed, 16 Apr 2014 10:34:34 -0600 Message-ID: Subject: Re: [VOTE] Release Apache Traffic Server 4.2.1 (RC0) From: Phil Sorber To: "users@trafficserver.apache.org" , "dev@trafficserver.apache.org" Content-Type: multipart/alternative; boundary=047d7b62494ee34cc904f72b7da3 --047d7b62494ee34cc904f72b7da3 Content-Type: text/plain; charset=UTF-8 So given all this I am voting -1 and calling this vote as a failure. I am attempting to test Alan's new patch and hopefully I will roll a 4.2.1-rc1 later this week. Thanks On Wed, Apr 16, 2014 at 9:18 AM, Alan M. Carroll < amc@network-geographics.com> wrote: > I was asked for a translation of my previous email, bonging the 4.2.1 RC0. > > The problem in 4.2.0 was a shift in the set of WKS values. These are not > just live data but also written to the cache in the object headers so if > they change at all, it de facto invalidates the cache. The 4.2.0 crashes > (TS-2564) are due to this, because various secondary bits of data get > written inconsistently which in turns causes ATS to look up the wrong data > for header fields. For instance, the VARY field would be written out along > with a hint about where it was in the header. When read back in 4.2.0 ATS > would use the stored WKS index to lookup the hint location and get the > wrong location (because VARY had shifted) and use that to find the wrong > data for VARY (possibly null or unallocated memory). > > To fix this, 4.2.1 simply clears all the hints and rewrites them when the > object is read from disk if using a cache version earlier than 4.2.1. This > ignores the stored values and uses only the current in memory values. > > However, it turns out that when the object is read from disk, it may be > stored in the ram cache. If retrieved from ram cache later, it goes through > the same logic as if it had been loaded from disk, which includes clearing > and rewriting the hints. The ATS logic, though, doesn't lock the object for > this because it is expected to be read only once read from the disk. The > TS-2564 logic violates this and thereby creates a race condition between > two transaction both access the same object. It is possible for one to > check the valid hints for a field and then, while it is trying to retrieve > the field, the other transaction can clear the hints causing the field to > not be found. This leads to a crash because the logic assumes (reasonably) > that if it's checked the hints and verified the field presence, the field > is present and will be found. If the field is not found, you get a null > pointer dereference. > > The solution is to prevent the 4.2.0 fixup from being done on objects > retrieved from the ram cache. There's no need as the fixup was done when it > was read from disk and put in the ram cache. There is no race condition for > disk reads because those are not shared until after the fixup. > > --047d7b62494ee34cc904f72b7da3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
So given all this I am voting -1 and calling this vote as = a failure. I am attempting to test Alan's new patch and hopefully I wil= l roll a 4.2.1-rc1 later this week.

Thanks


On Wed, Apr 16, 2014 at 9:18 AM, Alan M.= Carroll <amc@network-geographics.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex"> I was asked for a translation of my previous email, bonging the 4.2.1 RC0.<= br>
The problem in 4.2.0 was a shift in the set of WKS values. These are not ju= st live data but also written to the cache in the object headers so if they= change at all, it de facto invalidates the cache. The 4.2.0 crashes (TS-25= 64) are due to this, because various secondary bits of data get written inc= onsistently which in turns causes ATS to look up the wrong data for header = fields. For instance, the VARY field would be written out along with a hint= about where it was in the header. When read back in 4.2.0 ATS would use th= e stored WKS index to lookup the hint location and get the wrong location (= because VARY had shifted) and use that to find the wrong data for VARY (pos= sibly null or unallocated memory).

To fix this, 4.2.1 simply clears all the hints and rewrites them when the o= bject is read from disk if using a cache version earlier than 4.2.1. This i= gnores the stored values and uses only the current in memory values.

However, it turns out that when the object is read from disk, it may be sto= red in the ram cache. If retrieved from ram cache later, it goes through th= e same logic as if it had been loaded from disk, which includes clearing an= d rewriting the hints. The ATS logic, though, doesn't lock the object f= or this because it is expected to be read only once read from the disk. The= TS-2564 logic violates this and thereby creates a race condition between t= wo transaction both access the same object. It is possible for one to check= the valid hints for a field and then, while it is trying to retrieve the f= ield, the other transaction can clear the hints causing the field to not be= found. This leads to a crash because the logic assumes (reasonably) that i= f it's checked the hints and verified the field presence, the field is = present and will be found. If the field is not found, you get a null pointe= r dereference.

The solution is to prevent the 4.2.0 fixup from being done on objects retri= eved from the ram cache. There's no need as the fixup was done when it = was read from disk and put in the ram cache. There is no race condition for= disk reads because those are not shared until after the fixup.


--047d7b62494ee34cc904f72b7da3--