Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0EB9E200B7C for ; Thu, 8 Sep 2016 22:07:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0D805160AAD; Thu, 8 Sep 2016 20:07:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 12522160AD7 for ; Thu, 8 Sep 2016 22:07:21 +0200 (CEST) Received: (qmail 18269 invoked by uid 500); 8 Sep 2016 20:07:21 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 17917 invoked by uid 99); 8 Sep 2016 20:07:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Sep 2016 20:07:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D19DF2C1B84 for ; Thu, 8 Sep 2016 20:07:20 +0000 (UTC) Date: Thu, 8 Sep 2016 20:07:20 +0000 (UTC) From: "Ivan Bella (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-4391) Source deepcopies cannot be used safely in separate threads in tserver MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 08 Sep 2016 20:07:23 -0000 [ https://issues.apache.org/jira/browse/ACCUMULO-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474860#comment-15474860 ] Ivan Bella commented on ACCUMULO-4391: -------------------------------------- [~kturner] There was a slew of different exceptions that would occur, mostly stemming from the fact that the same decompressor was being used by multiple threads: 1) Most of them stem from using the same decompressor across multiple threads. The root cause of this was that the same decompressor was being returned to the codec pool multiple times and then later the same decompressor was being returned to multiple threads resulting is mass chaos. This is basically solved by ensuring that the decompressor is returned only once in the BCFile finish call. 2) The second set of issues stemmed from the close being called at the same time other threads were reading from the FSDataInputStream. This resulted in the decompressor being returned underneath and subsequently reused by some other read all while some read was occurring. This is solved by the synchronization added in the BoundedRangeInputFileStream. The synchronization within the CacheableBlockFile was added more by examination of the code to ensure we were not closing the same FSDataInputStream concurrently while reading/closing within the BoundedRangeInputFileStream. 3) The third problem was the available call being called concurrently on the FSDataInputStream while it was being closed in the BoundedRangeInputFileStream. I initially added synchronization there as well, however after our initial discussions in the pull request it was determined (and verified on our system) that the available call is really not being used except by the initialization of the CompressionInputStream the results of which are used in the getPos call which is not used. Hence simply avoiding the underlying available call seemed the best course of action. > Source deepcopies cannot be used safely in separate threads in tserver > ---------------------------------------------------------------------- > > Key: ACCUMULO-4391 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4391 > Project: Accumulo > Issue Type: Bug > Components: core > Affects Versions: 1.6.5 > Reporter: Ivan Bella > Assignee: Ivan Bella > Fix For: 1.6.6, 1.7.3, 1.8.1, 2.0.0 > > Original Estimate: 24h > Time Spent: 11h 40m > Remaining Estimate: 12h 20m > > We have iterators that create deep copies of the source and use them in separate threads. As it turns out this is not safe and we end up with many exceptions, mostly down in the ZlibDecompressor library. Curiously if you turn on the data cache for the table being scanned then the errors disappear. > After much hunting it turns out that the real bug is in the BoundedRangeFileInputStream. The read() method therein appropriately synchronizes on the underlying FSDataInputStream, however the available() method does not. Adding similar synchronization on that stream fixes the issues. On a side note, the available() call is only invoked within the hadoop CompressionInputStream for use in the getPos() call. That call does not appear to actually be used at least in this context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)