hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures
Date Fri, 05 Jun 2015 23:13:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575389#comment-14575389
] 

Richard Williams commented on HIVE-10410:
-----------------------------------------

[~thejas] In the test case I use to reproduce this issue, I'm not sharing a session among
multiple queries or setting hive.exec.parallel=true. I am, however, using the same session
across multiple calls to the HiveServer2 Thrift API--in particular, I'm running many concurrent
processes that each open a session, execute an asynchronous operation, poll HiveServer2 for
the status of that operation until it completes, and then close their session. I think that
what's happening is that the foreground thread is continuing to talk to the metastore using
its Hive object while the pooled background thread is executing the asynchronous operation
(and thus also using the same Hive object).

Right, the issue you mentioned is what I was talking about in my earlier comment--the patch
I uploaded relies on the UserGroupInformation.doAs call in the submitted background operation
to ensure that the current user in the background thread is the same as the current user in
the foreground thread. Thus, when the background thread calls Hive.get, the call to isCurrentUserOwner
will return false if the existing MetaStoreClient is associated with the wrong user and a
new connection, this time associated with the correct user, will be created. Presumably the
code that does this is the safeguard you're referring to?

{noformat}
  public static Hive get(HiveConf c, boolean needsRefresh) throws HiveException {
    Hive db = hiveDB.get();
    if (db == null || needsRefresh || !db.isCurrentUserOwner()) {
      if (db != null) {
        LOG.debug("Creating new db. db = " + db + ", needsRefresh = " + needsRefresh +
          ", db.isCurrentUserOwner = " + db.isCurrentUserOwner());
      }
      closeCurrent();
      c.set("fs.scheme.class", "dfs");
      Hive newdb = new Hive(c);
      hiveDB.set(newdb);
      return newdb;
    }

{noformat}

We haven't noticed any performance issues attributable to frequent reconnections to the metastore
to as a result of this change. Then again, we probably wouldn't; we're using Sentry and thus
have HiveServer2 impersonation disabled, so I would expect the current user to always be the
user running the HiveServer2 process. When I get a chance, I'll try changing Hive.createMetaStoreClient
to wrap the client it creates using HiveMetaStoreClient.newSynchronizedClient instead of removing
the code that sets the background thread's Hive object.

> Apparent race condition in HiveServer2 causing intermittent query failures
> --------------------------------------------------------------------------
>
>                 Key: HIVE-10410
>                 URL: https://issues.apache.org/jira/browse/HIVE-10410
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 0.13.1
>         Environment: CDH 5.3.3
> CentOS 6.4
>            Reporter: Richard Williams
>         Attachments: HIVE-10410.1.patch
>
>
> On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC occasionally
trigger odd Thrift exceptions with messages such as "Read a negative frame size (-2147418110)!"
or "out of sequence response" in HiveServer2's connections to the metastore. For certain metastore
calls (for example, showDatabases), these Thrift exceptions are converted to MetaExceptions
in HiveMetaStoreClient, which prevents RetryingMetaStoreClient from retrying these calls and
thus causes the failure to bubble out to the JDBC client.
> Note that as far as we can tell, this issue appears to only affect queries that are submitted
with the runAsync flag on TExecuteStatementReq set to true (which, in practice, seems to mean
all JDBC queries), and it appears to only manifest when HiveServer2 is using the new HTTP
transport mechanism. When both these conditions hold, we are able to fairly reliably reproduce
the issue by spawning about 100 simple, concurrent hive queries (we have been using "show
databases"), two or three of which typically fail. However, when either of these conditions
do not hold, we are no longer able to reproduce the issue.
> Some example stack traces from the HiveServer2 logs:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR hive.log: Got exception: org.apache.thrift.transport.TTransportException
Read a negative frame size (-2147418110)!
> org.apache.thrift.transport.TTransportException: Read a negative frame size (-2147418110)!
>         at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435)
>         at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414)
>         at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>         at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
>         at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>         at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>         at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>         at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
>         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
>         at org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
>         at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>         at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
>         at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
>         at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The above exception being converted into a MetaException and re-thrown:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR hive.ql.exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:Got exception: org.apache.thrift.transport.TTransportException Read
a negative frame size (-2147418110)!)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1141)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
>         at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
>         at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: MetaException(message:Got exception: org.apache.thrift.transport.TTransportException
Read a negative frame size (-2147418110)!)
>         at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1116)
>         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:839)
>         at org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
>         at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>         at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
>         ... 22 more
> {noformat}
> The above MetaException causing the query as a whole to fail:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR org.apache.hive.service.cli.operation.Operation: Error
running hive query:
> org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got
exception: org.apache.thrift.transport.TTransportException Read a negative frame size (-2147418110)!)
>         at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:148)
>         at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> An "out of sequence response" that occurred shortly after the above exception and may
have been triggered by it:
> {noformat}
> 2015-04-16 13:54:55,498 ERROR hive.log: Got exception: org.apache.thrift.TApplicationException
get_databases failed: out of sequence response
> org.apache.thrift.TApplicationException: get_databases failed: out of sequence response
>         at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
>         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
>         at org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
>         at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>         at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
>         at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
>         at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message