Return-Path: X-Original-To: apmail-ignite-dev-archive@minotaur.apache.org Delivered-To: apmail-ignite-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFB0719B7D for ; Tue, 22 Mar 2016 09:58:25 +0000 (UTC) Received: (qmail 36242 invoked by uid 500); 22 Mar 2016 09:58:25 -0000 Delivered-To: apmail-ignite-dev-archive@ignite.apache.org Received: (qmail 36171 invoked by uid 500); 22 Mar 2016 09:58:25 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 35828 invoked by uid 99); 22 Mar 2016 09:58:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2016 09:58:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7EBFE2C1F60 for ; Tue, 22 Mar 2016 09:58:25 +0000 (UTC) Date: Tue, 22 Mar 2016 09:58:25 +0000 (UTC) From: "Vladimir Ozerov (JIRA)" To: dev@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (IGNITE-2876) IGFS: System pool starvation is possible during data block write. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Vladimir Ozerov created IGNITE-2876: --------------------------------------- Summary: IGFS: System pool starvation is possible during data block write. Key: IGNITE-2876 URL: https://issues.apache.org/jira/browse/IGNITE-2876 Project: Ignite Issue Type: Bug Components: IGFS Affects Versions: 1.5.0.final Reporter: Vladimir Ozerov Assignee: Ivan Veselovsky Priority: Critical Fix For: 1.6 *Problem* IGFS has a set of messages to exchange data and signal events between nodes. These are: - {{IgfsAckMessage}} - {{IgfsBlocksMessage}} - {{IgfsDeleteMessage}} - {{IgfsFragmentizerRequest}} - {{IgfsFragmentizerResponse}} Currently these messages are processed in a system pool which is wrong and may lead to starvation, deadlocks and incorrect behavior. Several examples: 1) {{IgfsBlocksMessage}} handling logic performs "Cache.putAsync" operation. This operation involves acquiring of semaphore permit. This semaphore, in turn, can only be released from another thread in the same system pool. As such, all system pool threads could hang on permit acquire forever. 2) In case file system size is exceeded, the same message waits for some time in hope that free space in cache will appear. However, if all system pool threads waits for this point, concurrent block removal cannot proceed, so these threads are doomed to receive {{IgfsOutOfSpaceException}} irrespective of whether they wait or not. *Solution* 1) Introduce new IO policy for IGFS (see {{GridIoPolicy}}). 2) Force all IGFS message to be processed with this policy. No backward compatibility is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)