From dev-return-56193-archive-asf-public=cust-asf.ponee.io@thrift.apache.org Fri Jun 14 22:17:03 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 52D1B18067E for ; Sat, 15 Jun 2019 00:17:03 +0200 (CEST) Received: (qmail 98827 invoked by uid 500); 14 Jun 2019 22:17:02 -0000 Mailing-List: contact dev-help@thrift.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@thrift.apache.org Delivered-To: mailing list dev@thrift.apache.org Received: (qmail 98813 invoked by uid 99); 14 Jun 2019 22:17:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jun 2019 22:17:02 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A6ECBE2D6F for ; Fri, 14 Jun 2019 22:17:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id ECCD32463B for ; Fri, 14 Jun 2019 22:17:00 +0000 (UTC) Date: Fri, 14 Jun 2019 22:17:00 +0000 (UTC) From: "Jens Geyer (JIRA)" To: dev@thrift.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (THRIFT-4887) Thrift will OOM at a low concurrency if fields added and old client requests new server MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/THRIFT-4887?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D168= 64485#comment-16864485 ]=20 Jens Geyer commented on THRIFT-4887: ------------------------------------ I just did a quick check using netstd. {quote} Why there's an OOM issue As we know Thrift tries to consume all data in inputstream by skipping fiel= ds that are redundant or have a type mismatch. At the same time Thrift vali= dates every struct object and throws an exception if it's invalid. It confl= icts because Thrift won't consume subsequent fields if there's an exception= . The current Thrift RPC request fails on such an exception, just as expected= , but nothing is done to the underlying inputstream, which means there stil= l exists some redundant data, and cursor points to some middle position of = the inputstream. {quote} That's not bow it should work. The idea is that the validation throws and a= borts the read, that's correct so far. But as a co nsequence, the Code shou= ld end up in {{Server.Execute()}} where any exception is caught and the who= le protocol/Transport stack is reinitialized from scratch. If that is not t= he case with Java anymore, that should be corrected. > Thrift will OOM at a low concurrency if fields added and old client reque= sts new server > -------------------------------------------------------------------------= -------------- > > Key: THRIFT-4887 > URL: https://issues.apache.org/jira/browse/THRIFT-4887 > Project: Thrift > Issue Type: Bug > Environment: Almost all versions from 0.8.0 to the newest 0.13.0-= snapshot, and verify on 0.9.3/0.11.0. > Almost all languages, and verified on Go/Java > Reporter: aqingsir > Priority: Major > Attachments: readI32.jpg > > Time Spent: 10m > Remaining Estimate: 0h > > (Could see=C2=A0 more on [https://github.com/aqingsao/thrift-oom]) > //background > A serious issue occured in our prod env and finally it came out to be the= changement of some fields in an IDL file,=C2=A0old client still requested = new server and=C2=A0crashed due to OOM. > IDL changement could be stated as: Return value of the interface is a lis= t, element of which is a struct object and has 5 fields. A new field is add= ed to the middle of the struct. > // to reproduce > In this case a low concurency of 10 will reproduce this issue, you could = find a demo project on: [https://github.com/aqingsao/thrift-oom] > //=C2=A0reason > Thrift tries to consume all data in inputstream by skipping fields that a= re redundant or have a type mismatch.=C2=A0But it won't consume subsequent = fields if there's=C2=A0an exception. > In such a case Thrift does nothing=C2=A0the underlying inputstream,=C2=A0= so trouble comes to the next request who reuses this connection, as the=C2= =A0cursor still points to some middle position of the inputstream. > As Thrift always starts with a readI32() method for any response, which m= eans the length of the method's name. Unbelievable the length could be as l= arge as 184549632, which is about 176M. This explains why OOM occurs even a= t a concurrency of 10 > // how to fix > Always clear inputstream in TSocket if there are any redundant data=C2=A0= at the end of a method call. > I'll submit a PR soon for Java version. > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)