thrift-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James E. King III (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (THRIFT-4591) Incompatibility using non-blocking server and frame transport on C++ side?
Date Thu, 05 Jul 2018 11:47:00 GMT

     [ https://issues.apache.org/jira/browse/THRIFT-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James E. King III updated THRIFT-4591:
--------------------------------------
    Description: 
1) realize thrift server with TNonblockingServer via c++;

2) realize thrift client via lua lib and choose frame transport.

3) call remote interface failed with "TTransportException:0: Default (unknown)" print, and
the server show "TConnection::workSocket(): THRIFT_EAGAIN (unavailable resources)" error.

4)investigate this fault with tcpdump tool, attachment 9090.pcap show the frame msg doesnot
contains frame size field, the rifht situation of attachment 9090_1.pcap show the frame msg
contains 4 bytes (00 00 00 25) before protocol id field.

5) dig into the fault and tried to find root cause, then i found there is an fault in TFramedTransport:flush
function in TFramedTransport.lua file. the original realization is:

-----

function TFramedTransport:flush()
  if self.doWrite == false then
    return self.trans:flush()
  end

  -- If the write fails we still want wBuf to be clear
  local tmp = self.wBuf
  self.wBuf = ''
  local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
  self.trans:write(frame_len_buf)
  self.trans:write(tmp)
  self.trans:flush()
end

-----

which send frame size file and reset msg content independently.

  was:
(jking): C++ TFramedTransport reads the frame size then attempts to read the message.  If
it only gets part of the message it returns the partial read, and the upper layer will not
be able to decode the message, further read may be called again, when it will go and try to
read a frame size again, but it could be in the middle of message payload the underlying transport
hadn't yet received.  It's amazing to see this in code that's been around so long!

Original Bug report:

1) realize thrift server with TNonblockingServer via c++;

2) realize thrift client via lua lib and choose frame transport.

3) call remote interface failed with "TTransportException:0: Default (unknown)" print, and
the server show "TConnection::workSocket(): THRIFT_EAGAIN (unavailable resources)" error.

4)investigate this fault with tcpdump tool, attachment 9090.pcap show the frame msg doesnot
contains frame size field, the rifht situation of attachment 9090_1.pcap show the frame msg
contains 4 bytes (00 00 00 25) before protocol id field.

5) dig into the fault and tried to find root cause, then i found there is an fault in TFramedTransport:flush
function in TFramedTransport.lua file. the original realization is:

-----

function TFramedTransport:flush()
  if self.doWrite == false then
    return self.trans:flush()
  end

  -- If the write fails we still want wBuf to be clear
  local tmp = self.wBuf
  self.wBuf = ''
  local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
  self.trans:write(frame_len_buf)
  self.trans:write(tmp)
  self.trans:flush()
end

-----

which send frame size file and reset msg content independently.

 ----------------------

(jking) Analysis of original report: it fixes the sender to send once, but it shouldn't matter
if the size is sent separately from the payload.  It's the receiver where the root cause is,
in this case the C++ library.  This issue may not be limited to the C++ implementation, but
we need a test to insert a pause between sending a frame size and sending the payload and
see what happens on all the implementations.

We're not going to merge the lua client fix as it doubles the memory requirements to send,
despite reducing the write() count from 2 to 1.


> Incompatibility using non-blocking server and frame transport on C++ side?
> --------------------------------------------------------------------------
>
>                 Key: THRIFT-4591
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4591
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>    Affects Versions: 0.11.0
>            Reporter: allen_lee
>            Assignee: James E. King III
>            Priority: Blocker
>         Attachments: 9090.pcap, 9090_1.pcap
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> 1) realize thrift server with TNonblockingServer via c++;
> 2) realize thrift client via lua lib and choose frame transport.
> 3) call remote interface failed with "TTransportException:0: Default (unknown)" print,
and the server show "TConnection::workSocket(): THRIFT_EAGAIN (unavailable resources)" error.
> 4)investigate this fault with tcpdump tool, attachment 9090.pcap show the frame msg doesnot
contains frame size field, the rifht situation of attachment 9090_1.pcap show the frame msg
contains 4 bytes (00 00 00 25) before protocol id field.
> 5) dig into the fault and tried to find root cause, then i found there is an fault in
TFramedTransport:flush function in TFramedTransport.lua file. the original realization is:
> -----
> function TFramedTransport:flush()
>   if self.doWrite == false then
>     return self.trans:flush()
>   end
>   -- If the write fails we still want wBuf to be clear
>   local tmp = self.wBuf
>   self.wBuf = ''
>   local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
>   self.trans:write(frame_len_buf)
>   self.trans:write(tmp)
>   self.trans:flush()
> end
> -----
> which send frame size file and reset msg content independently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message