Return-Path: X-Original-To: apmail-trafficserver-users-archive@www.apache.org Delivered-To: apmail-trafficserver-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4DBAF11698 for ; Wed, 16 Apr 2014 15:18:47 +0000 (UTC) Received: (qmail 78897 invoked by uid 500); 16 Apr 2014 15:18:44 -0000 Delivered-To: apmail-trafficserver-users-archive@trafficserver.apache.org Received: (qmail 78846 invoked by uid 500); 16 Apr 2014 15:18:44 -0000 Mailing-List: contact users-help@trafficserver.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@trafficserver.apache.org Delivered-To: mailing list users@trafficserver.apache.org Received: (qmail 78838 invoked by uid 99); 16 Apr 2014 15:18:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 15:18:43 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [54.235.125.116] (HELO mail.network-geographics.com) (54.235.125.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 15:18:37 +0000 Received: from Alita (c-50-179-158-202.hsd1.il.comcast.net [50.179.158.202]) by mail.network-geographics.com (Postfix) with ESMTPSA id 734D62056D for ; Wed, 16 Apr 2014 15:18:16 +0000 (UTC) Date: Wed, 16 Apr 2014 10:18:03 -0500 From: "Alan M. Carroll" Organization: Network Geographics, Inc. X-Priority: 3 (Normal) Message-ID: <1115499933.20140416101803@network-geographics.com> To: Phil Sorber Subject: Re: [VOTE] Release Apache Traffic Server 4.2.1 (RC0) In-Reply-To: <558096448.20140415155441@network-geographics.com> References: <558096448.20140415155441@network-geographics.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I was asked for a translation of my previous email, bonging the 4.2.1 RC0. The problem in 4.2.0 was a shift in the set of WKS values. These are not just live data but also written to the cache in the object headers so if they change at all, it de facto invalidates the cache. The 4.2.0 crashes (TS-2564) are due to this, because various secondary bits of data get written inconsistently which in turns causes ATS to look up the wrong data for header fields. For instance, the VARY field would be written out along with a hint about where it was in the header. When read back in 4.2.0 ATS would use the stored WKS index to lookup the hint location and get the wrong location (because VARY had shifted) and use that to find the wrong data for VARY (possibly null or unallocated memory). To fix this, 4.2.1 simply clears all the hints and rewrites them when the object is read from disk if using a cache version earlier than 4.2.1. This ignores the stored values and uses only the current in memory values. However, it turns out that when the object is read from disk, it may be stored in the ram cache. If retrieved from ram cache later, it goes through the same logic as if it had been loaded from disk, which includes clearing and rewriting the hints. The ATS logic, though, doesn't lock the object for this because it is expected to be read only once read from the disk. The TS-2564 logic violates this and thereby creates a race condition between two transaction both access the same object. It is possible for one to check the valid hints for a field and then, while it is trying to retrieve the field, the other transaction can clear the hints causing the field to not be found. This leads to a crash because the logic assumes (reasonably) that if it's checked the hints and verified the field presence, the field is present and will be found. If the field is not found, you get a null pointer dereference. The solution is to prevent the 4.2.0 fixup from being done on objects retrieved from the ram cache. There's no need as the fixup was done when it was read from disk and put in the ram cache. There is no race condition for disk reads because those are not shared until after the fixup.