From reviews-return-695581-archive-asf-public=cust-asf.ponee.io@spark.apache.org Wed Aug 29 13:17:03 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 296A4180657 for ; Wed, 29 Aug 2018 13:17:02 +0200 (CEST) Received: (qmail 81573 invoked by uid 500); 29 Aug 2018 11:17:01 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 81562 invoked by uid 99); 29 Aug 2018 11:17:00 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2018 11:17:00 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 8AACADFAB2; Wed, 29 Aug 2018 11:17:00 +0000 (UTC) From: steveloughran To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S... Content-Type: text/plain Message-Id: <20180829111700.8AACADFAB2@git1-us-west.apache.org> Date: Wed, 29 Aug 2018 11:17:00 +0000 (UTC) Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22186 This will eliminate a race condition between FS shutdown (in the hadoop shutdown manager) and the hive callback. Theres a risk today that the filesystems will be closed before that event log close()/rename() is called, so things don't get saved —and this can happen with any FS. registering the shutdown hook via the spark APIs, with a priority > than the FS shutdown, guarantees that it will be called before the FS shutdown. But it doesn't guarantee that the operation will complete within the 10s time limit hard coded into Hadoop 2.8.x+ for any single shutdown hook to complete. It is going to work in HDFS except in the special case of HDFS NN lock or GC pause. The Hadoop configurable delay of [HADOOP-15679](https://issues.apache.org/jira/browse/HADOOP-15679) needs to go in. I've increased the default timeout to 30s there for more forgiveness with HDFS, and for object stores with O(data) renames people should configure it with a timeout of minutes, or, if they want to turn it off altogether, hours. I'm backporting HADOOP-15679 to all branches 2.8.x+, so all hadoop versions with that timeout will have the timeout configurable & the default time extended. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org