hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karen Coppage (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-24168) Disable hdfsEncryptionShims cache during query-based compaction
Date Fri, 18 Sep 2020 13:42:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-24168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karen Coppage updated HIVE-24168:
---------------------------------
    Description: 
Hive keeps a cache of encryption shims in SessionState (Map<URI, HadoopShims.HdfsEncryptionShim>
hdfsEncryptionShims). Each encryption shim in the cache stores a FileSystem object.
 After compaction where the session user is not the same user as the owner of the partition/table
directory, we close all FileSystem objects associated with the user running the compaction,
possibly closing an FS stored in the encryption shim cache. The next time query-based compaction
is run on a table/partition owned by the same user, compaction will fail in MoveTask[1] since
the FileSystem stored in the cache was closed.
EDIT: This change makes it optional to disable the cache; and disables the cache during query-based
compaction.

Workaround:  set fs.hdfs.impl.disable.cache to true. Probably comes with other side effects,
at least a slowdown.

[1] Error:
{code:java}
2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: [rncdpdev-2.fyre.ibm.com-27]:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException:
Filesystem closed. org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
Filesystem closed
	at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637)
	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147)
	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120)
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
	at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
	at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
	at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
	at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477)
	at org.apache.hadoop.hive.ql.DriverUtils.runOnDriver(DriverUtils.java:70)
	at org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:116)
	at org.apache.hadoop.hive.ql.txn.compactor.MmMajorQueryCompactor.runCompaction(MmMajorQueryCompactor.java:72)
	at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:232)
	at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:221)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
	at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:218)
{code}

  was:
Hive keeps a cache of encryption shims in SessionState (Map<URI, HadoopShims.HdfsEncryptionShim>
hdfsEncryptionShims). Each encryption shim in the cache stores a FileSystem object.
After compaction where the session user is not the same user as the owner of the partition/table
directory, we close all FileSystem objects associated with the user running the compaction,
possibly closing an FS stored in the encryption shim cache. The next time query-based compaction
is run on a table/partition owned by the same user, compaction will fail in MoveTask[1] since
the FileSystem stored in the cache was closed.
This change disables the cache during query-based compaction (optionally; default: disabled).

[1] Error:
{code:java}
2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: [rncdpdev-2.fyre.ibm.com-27]:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException:
Filesystem closed. org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
Filesystem closed
	at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637)
	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147)
	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120)
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
	at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
	at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
	at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
	at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477)
	at org.apache.hadoop.hive.ql.DriverUtils.runOnDriver(DriverUtils.java:70)
	at org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:116)
	at org.apache.hadoop.hive.ql.txn.compactor.MmMajorQueryCompactor.runCompaction(MmMajorQueryCompactor.java:72)
	at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:232)
	at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:221)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
	at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:218)
{code}


> Disable hdfsEncryptionShims cache during query-based compaction
> ---------------------------------------------------------------
>
>                 Key: HIVE-24168
>                 URL: https://issues.apache.org/jira/browse/HIVE-24168
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hive keeps a cache of encryption shims in SessionState (Map<URI, HadoopShims.HdfsEncryptionShim>
hdfsEncryptionShims). Each encryption shim in the cache stores a FileSystem object.
>  After compaction where the session user is not the same user as the owner of the partition/table
directory, we close all FileSystem objects associated with the user running the compaction,
possibly closing an FS stored in the encryption shim cache. The next time query-based compaction
is run on a table/partition owned by the same user, compaction will fail in MoveTask[1] since
the FileSystem stored in the cache was closed.
> EDIT: This change makes it optional to disable the cache; and disables the cache during
query-based compaction.
> Workaround:  set fs.hdfs.impl.disable.cache to true. Probably comes with other side
effects, at least a slowdown.
> [1] Error:
> {code:java}
> 2020-09-08 11:23:50,170 ERROR org.apache.hadoop.hive.ql.Driver: [rncdpdev-2.fyre.ibm.com-27]:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException:
Filesystem closed. org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
Filesystem closed
> 	at org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:4637)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4147)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4694)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:3120)
> 	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:423)
> 	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> 	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> 	at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
> 	at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> 	at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> 	at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> 	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:477)
> 	at org.apache.hadoop.hive.ql.DriverUtils.runOnDriver(DriverUtils.java:70)
> 	at org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:116)
> 	at org.apache.hadoop.hive.ql.txn.compactor.MmMajorQueryCompactor.runCompaction(MmMajorQueryCompactor.java:72)
> 	at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:232)
> 	at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:221)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> 	at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:218)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message