hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures
Date Fri, 05 Jun 2015 18:52:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575005#comment-14575005
] 

Thejas M Nair commented on HIVE-10410:
--------------------------------------

[~Richard Williams]
Are you sharing the same session for multiple queries in your cluster or set hive.exec.parallel=true
? That is only case where I think this would happen.
Also, there is a problem with the change in the patch. The HiveMetaStoreClient within Hive
object is associated with a specific user, so if it re-uses any Hive object from the current
thread, it would get the wrong user. There is a safeguard against this that was added recently,
but it would result in an expensive creation of new Hive object and (hive metastore client).
I think instead of pursing that option, we should just look at synchronizing thrift client
use within metastore client. There is already a HiveMetaStoreClient.newSynchronizedClient
that Hive object could use to get a synchronized client.

We also need to look at other issues like what [~ctang.ma] mentioned, but all of it does not
have to happen in this jira.



> Apparent race condition in HiveServer2 causing intermittent query failures
> --------------------------------------------------------------------------
>
>                 Key: HIVE-10410
>                 URL: https://issues.apache.org/jira/browse/HIVE-10410
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 0.13.1
>         Environment: CDH 5.3.3
> CentOS 6.4
>            Reporter: Richard Williams
>         Attachments: HIVE-10410.1.patch
>
>
> On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC occasionally
trigger odd Thrift exceptions with messages such as "Read a negative frame size (-2147418110)!"
or "out of sequence response" in HiveServer2's connections to the metastore. For certain metastore
calls (for example, showDatabases), these Thrift exceptions are converted to MetaExceptions
in HiveMetaStoreClient, which prevents RetryingMetaStoreClient from retrying these calls and
thus causes the failure to bubble out to the JDBC client.
> Note that as far as we can tell, this issue appears to only affect queries that are submitted
with the runAsync flag on TExecuteStatementReq set to true (which, in practice, seems to mean
all JDBC queries), and it appears to only manifest when HiveServer2 is using the new HTTP
transport mechanism. When both these conditions hold, we are able to fairly reliably reproduce
the issue by spawning about 100 simple, concurrent hive queries (we have been using "show
databases"), two or three of which typically fail. However, when either of these conditions
do not hold, we are no longer able to reproduce the issue.
> Some example stack traces from the HiveServer2 logs:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR hive.log: Got exception: org.apache.thrift.transport.TTransportException
Read a negative frame size (-2147418110)!
> org.apache.thrift.transport.TTransportException: Read a negative frame size (-2147418110)!
>         at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435)
>         at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414)
>         at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>         at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
>         at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>         at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>         at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>         at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
>         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
>         at org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
>         at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>         at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
>         at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
>         at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The above exception being converted into a MetaException and re-thrown:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR hive.ql.exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:Got exception: org.apache.thrift.transport.TTransportException Read
a negative frame size (-2147418110)!)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1141)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
>         at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
>         at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: MetaException(message:Got exception: org.apache.thrift.transport.TTransportException
Read a negative frame size (-2147418110)!)
>         at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1116)
>         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:839)
>         at org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
>         at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>         at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
>         ... 22 more
> {noformat}
> The above MetaException causing the query as a whole to fail:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR org.apache.hive.service.cli.operation.Operation: Error
running hive query:
> org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got
exception: org.apache.thrift.transport.TTransportException Read a negative frame size (-2147418110)!)
>         at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:148)
>         at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> An "out of sequence response" that occurred shortly after the above exception and may
have been triggered by it:
> {noformat}
> 2015-04-16 13:54:55,498 ERROR hive.log: Got exception: org.apache.thrift.TApplicationException
get_databases failed: out of sequence response
> org.apache.thrift.TApplicationException: get_databases failed: out of sequence response
>         at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
>         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
>         at org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
>         at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>         at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
>         at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
>         at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message