spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From steveloughran <...@git.apache.org>
Subject [GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...
Date Wed, 29 Aug 2018 11:17:00 GMT
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    This will eliminate a race condition between FS shutdown (in the hadoop shutdown manager)
and the hive callback. Theres a risk today that the filesystems will be closed before that
event log close()/rename() is called, so things don't get saved —and this can happen with
any FS.
    
    registering the shutdown hook via the spark APIs, with a priority > than the FS shutdown,
guarantees that it will be called before the FS shutdown. But it doesn't guarantee that the
operation will complete within the 10s time limit hard coded into Hadoop 2.8.x+ for any single
shutdown hook to complete. It is going to work in HDFS except in the special case of HDFS
NN lock or GC pause.
    
    The Hadoop configurable delay of [HADOOP-15679](https://issues.apache.org/jira/browse/HADOOP-15679)
needs to go in. I've increased the default timeout to 30s there for more forgiveness with
HDFS, and for object stores with O(data) renames people should configure it with a timeout
of minutes, or, if they want to turn it off altogether, hours. 
    
    I'm backporting HADOOP-15679 to all branches 2.8.x+, so all hadoop versions with that
timeout will have the timeout configurable & the default time extended.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message