impala-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [3/8] impala git commit: IMPALA-6232: Disable file handle cache by default
Date Wed, 06 Dec 2017 01:56:00 GMT
IMPALA-6232: Disable file handle cache by default

There are scenarios where HDFS file appends or HDFS file
overwrites can lead to HDFS disabling short circuit reads.
Since this can be a performance regression, this changes
the default value for max_cached_file_handles to 0 to
disable the file handle cache by default. This also changes
the default value for unused_file_handle_timeout_sec to 270.
If users enable the file handle cache, this setting will
prevent some of the scenarios that disable short circuit

Ran existing file handle cache tests to verify that there
is no impact.

Change-Id: Iea7f943f63b72b42286a9e8b9987308baa79d7b0
Reviewed-by: Joe McDonnell <>
Tested-by: Impala Public Jenkins


Branch: refs/heads/master
Commit: 1f1bff8e8d35b66308a1e865cdc8bce41ce89873
Parents: e4a2f5d
Author: Joe McDonnell <>
Authored: Mon Dec 4 10:21:33 2017 -0800
Committer: Impala Public Jenkins <>
Committed: Tue Dec 5 21:03:00 2017 +0000

 be/src/runtime/io/ | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/be/src/runtime/io/ b/be/src/runtime/io/
index 4f4074c..668fc75 100644
--- a/be/src/runtime/io/
+++ b/be/src/runtime/io/
@@ -94,7 +94,10 @@ DEFINE_int32(max_free_io_buffers, 128,
 // uses about 6kB of memory. 20k file handles will thus reserve ~120MB of memory.
 // The actual amount of memory that is associated with a file handle can be larger
 // or smaller, depending on the replication factor for this file or the path name.
-DEFINE_uint64(max_cached_file_handles, 20000, "Maximum number of HDFS file handles "
+// TODO: This is currently disabled due to HDFS-12528, which can disable short circuit
+// reads when file handle caching is enabled. This should be reenabled by default
+// when that issue is fixed.
+DEFINE_uint64(max_cached_file_handles, 0, "Maximum number of HDFS file handles "
     "that will be cached. Disabled if set to 0.");
 // The unused file handle timeout specifies how long a file handle will remain in the
@@ -106,7 +109,10 @@ DEFINE_uint64(max_cached_file_handles, 20000, "Maximum number of HDFS
file handl
 // from being freed. When the metadata sees that a file has been deleted, the file handle
 // will no longer be used by future queries. Aging out this file handle allows the
 // disk space to be freed in an appropriate period of time.
-DEFINE_uint64(unused_file_handle_timeout_sec, 21600, "Maximum time, in seconds, that an "
+// TODO: HDFS-12528 (which can disable short circuit reads) is more likely to happen
+// if file handles are cached for longer than 5 minutes. Use a conservative value for
+// the unused file handle cache timeout until HDFS-12528 is fixed.
+DEFINE_uint64(unused_file_handle_timeout_sec, 270, "Maximum time, in seconds, that an "
     "unused HDFS file handle will remain in the file handle cache. Disabled if set "
     "to 0.");

View raw message