thrift-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (JIRA)" <>
Subject [jira] [Commented] (THRIFT-4805) Fix logic of THRIFT-3769
Date Sat, 16 Feb 2019 01:05:00 GMT


Mithun Radhakrishnan commented on THRIFT-4805:

In the first patch [THRIFT-4805.1.patch|],
I aim to correct the behaviour introduced in THRIFT-3769.

One possible fault (in my patch above, and in the original fix to THRIFT-3769) is that all
{{TTransportExceptions}} are suppressed. But there might be legitimate instances where {{TTransportExceptions}}
would need to be bubbled upwards. [~thiruvel]'s original intention in THRIFT-2268 was to specifically
catch and suppress only the {{TTransportException.END_OF_FILE}} case, for {{TSaslServerTransport}},
as is evidenced by [the error-handling in {{TSaslTransport::open()}}|]:

      if (!readSaslHeader && e.getType() == TTransportException.END_OF_FILE) {
        LOGGER.debug("No data or no sasl data in the stream");
        throw new TSaslTransportException("No data or no sasl data in the stream during negotiation",
      throw e;

Apropos, I have a bonus patch that changes the blanket suppression of {{TTransportExceptions}}
to a more narrow {{END_OF_FILE}} case. I'll attach it herewith.

[~vihangk1], does it sound agreeable to narrow the focus down to this kind of failure? These
are pretty much what I see in production, with VIP/load-balancers, but I'd like to be sure
that this covers your cases as well.

> Fix logic of THRIFT-3769
> ------------------------
>                 Key: THRIFT-4805
>                 URL:
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.12.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>            Priority: Major
>         Attachments: THRIFT-4805.1.patch, THRIFT-4805.bonus.patch
> This has to do with the fix checked in for THRIFT-3769 and THRIFT-2268, which was to
mute the noise from \{{TSaslTransportException}}s raised from load-balancer health-checks
for Thrift services, such as the Hive metastore. Please consider [the code in TThreadPoolServer::run()|]:
> {code:java|}
>       } catch (TException tx) {
>         LOGGER.error("Thrift error occurred during processing of message.", tx);
>       } catch (Exception x) {
>         // We'll usually receive RuntimeException types here
>         // Need to unwrap to ascertain real causing exception before we choose to ignore
>         Throwable realCause = x.getCause();
>         // Ignore err-logging all transport-level/type exceptions
>         if ((realCause != null && realCause instanceof TTransportException)
>             || (x instanceof TTransportException)) {
>           LOGGER.debug(
>               "Received TTransportException during processing of message. Ignoring.",
>               x);
>         } else {
>           // Log the exception at error level and continue
>           LOGGER.error("Error occurred during processing of message.", x);
>         }
>       } finally {...}
> {code}
> The logic here is solid for {{RuntimeExceptions}} that wrap {{TTansportExceptions}}.
But it slips up on handling the condition being checked for at [line#323|],
> {code:java|}
>             || (x instanceof TTransportException)) {
> {code}
> The {{catch (Exception x)}} comes *after* the {{catch (TException tx)}}, so it's a guarantee
that {{!(x instanceof TTransportException)}}. When a {{TTransportException}} (or {{TSaslTransportException}})
is thrown, it will be caught in the first catch block, and logged. This rather defeats the
purpose of the fix. The error manifests with the following stack-trace filling up my logs:
> {noformat}
> org.apache.thrift.transport.TTransportException
>         at
>         at org.apache.thrift.transport.TTransport.readAll(
>         at org.apache.thrift.transport.TSaslTransport.readLength(
>         at org.apache.thrift.transport.TSaslTransport.readFrame(
>         at
>         at
>         at org.apache.thrift.transport.TTransport.readAll(
>         at org.apache.thrift.protocol.TBinaryProtocol.readAll(
>         at org.apache.thrift.protocol.TBinaryProtocol.readI32(
>         at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(
>         at org.apache.thrift.TBaseProcessor.process(
>         at org.apache.hive.service.cli.thrift.ThriftCLIMetricsProcessor.process(
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$
>         at Method)
>         at
>         at
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(
>         at org.apache.thrift.server.TThreadPoolServer$
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>         at java.util.concurrent.ThreadPoolExecutor$
>         at
> {noformat}
> The fix is simple. I'll upload it shortly, for review.

This message was sent by Atlassian JIRA

View raw message