thrift-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "aqingsir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (THRIFT-4887) Thrift will OOM at a low concurrency if fields added and old client requests new server
Date Thu, 20 Jun 2019 03:39:00 GMT

    [ https://issues.apache.org/jira/browse/THRIFT-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868225#comment-16868225
] 

aqingsir commented on THRIFT-4887:
----------------------------------

Totally agree with your comment #1: Never add/remove a field especially one that is in the
middle of a struct. And if there has to be such a case, make the field to be the last one
and marked as optional. This is what we advocate within our workmates.

However there still exists such a risk, especially in large companies, and this is why I
open this  issue.

For you comment of "the code should end up in {{Server.Execute()}} where any exception is
caught and the whole protocol/transport stack is reinitialized from scratch, including closing
sockets, reinstantiating a fresh framed transport or whatever else there might be", my opinion
is:

1. It's fine to reinitialize protocol/transport stack from scratch, but might not close sockets
as subsequent requests(maybe another thrift method that works fine) could re-use them. Opening
a lot sockets consume a lot resources and could easily lead to Socket Exceptions.

2. If this is an issue, it exist in both client side and server side, my fix is in client
side. I need to dive more into source code to see how to fix it in server side, probably
Server.Execute() as you suggest.

 

> Thrift will OOM at a low concurrency if fields added and old client requests new server
> ---------------------------------------------------------------------------------------
>
>                 Key: THRIFT-4887
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4887
>             Project: Thrift
>          Issue Type: Bug
>         Environment: Almost all versions from 0.8.0 to the newest 0.13.0-snapshot, and
verify on 0.9.3/0.11.0.
> Almost all languages, and verified on Go/Java
>            Reporter: aqingsir
>            Priority: Major
>         Attachments: readI32.jpg
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> (Could see  more on [https://github.com/aqingsao/thrift-oom])
> //background
> A serious issue occured in our prod env and finally it came out to be the changement
of some fields in an IDL file, old client still requested new server and crashed due to
OOM.
> IDL changement could be stated as: Return value of the interface is a list, element of
which is a struct object and has 5 fields. A new field is added to the middle of the struct.
> // to reproduce
> In this case a low concurency of 10 will reproduce this issue, you could find a demo
project on: [https://github.com/aqingsao/thrift-oom]
> // reason
> Thrift tries to consume all data in inputstream by skipping fields that are redundant
or have a type mismatch. But it won't consume subsequent fields if there's an exception.
> In such a case Thrift does nothing the underlying inputstream, so trouble comes to
the next request who reuses this connection, as the cursor still points to some middle position
of the inputstream.
> As Thrift always starts with a readI32() method for any response, which means the length
of the method's name. Unbelievable the length could be as large as 184549632, which is about
176M. This explains why OOM occurs even at a concurrency of 10
> // how to fix
> Always clear inputstream in TSocket if there are any redundant data at the end of a
method call.
> I'll submit a PR soon for Java version.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message