Return-Path: X-Original-To: apmail-ignite-dev-archive@minotaur.apache.org Delivered-To: apmail-ignite-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D0704183E7 for ; Thu, 15 Oct 2015 16:11:05 +0000 (UTC) Received: (qmail 35201 invoked by uid 500); 15 Oct 2015 16:11:05 -0000 Delivered-To: apmail-ignite-dev-archive@ignite.apache.org Received: (qmail 35160 invoked by uid 500); 15 Oct 2015 16:11:05 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 35112 invoked by uid 99); 15 Oct 2015 16:11:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Oct 2015 16:11:05 +0000 Date: Thu, 15 Oct 2015 16:11:05 +0000 (UTC) From: "Ivan Veselovsky (JIRA)" To: dev@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (IGNITE-1697) IGFS: implement reliable Igfs failover logic MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Ivan Veselovsky created IGNITE-1697: --------------------------------------- Summary: IGFS: implement reliable Igfs failover logic Key: IGNITE-1697 URL: https://issues.apache.org/jira/browse/IGNITE-1697 Project: Ignite Issue Type: Bug Reporter: Ivan Veselovsky Assignee: Ivan Veselovsky Fix For: 1.5 Problems to solve: 1) currently a write lock for a file may stay taken forever if a node have taken the lock and then crashed. 2) Currently the blocks of file content are written not just as dataCache.put() operations , but sent using ad-hoc async messages. This was done earlier to improve performance. But in order to implement reliable failover we need to get rid of that and use simple put() or asyncPut() cache operations. Solution plan: 1) use async put to write file data blocks. 2) do writing using scheme "lock" -> "reserve space" -> "write" -> "commit" -> "release lock". 3) The id of the node that locked a file should be readable from the lock id. 4) Upon taking a file lock the following procedure should be performed: if file is locked, take the node Id of the node that locked the file. After that ask DiscoveryProcessor if this node is alive. If it is not (node has left topology), perform cleanup procedure: delete all the data blocks of the reserved data range, then delete the lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)