hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kelly (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-916) Problem receiving messages from subscribed channels in c++ client
Date Wed, 03 Nov 2010 09:14:26 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927799#action_12927799
] 

Ivan Kelly commented on ZOOKEEPER-916:
--------------------------------------

The root cause is that startReceiving is called from both startDelivery and the connect callback.
This is correct
because there are scenarios where it may be needed in both. The checking of the receiving
flag is the correct way to
resolve this. 

However, this double call of startReceiving really happens *every* time you use the c++ client
to receive on a
subscription, so it wasn't clear why we've only seen the problem now. 

Turns out that it only triggers when a partial message is received, so we only saw it when
the messages were very big.
For example, if the message is 100 bytes, the 100 bytes will arrive to the c++ client at the
same time
sizeReadCallbackHandler will read the size out of the buffer and since the whole message is
already in the buffer will
call messageReadCallbackHandler directly instead of calling into asio to read more from the
buffer. After
messageReadCallbackHandler has finished, there will be 0 bytes in the buffer, or another whole
message in the buffer,
so the second invocation of sizeReadCallbackHandler will behave correctly. 

However, if the message is bigger than the MTU, then the first sizeReadCallbackHandler will
run, which reads the
message size from the buffer. It then sees that the remaining bytes in the buffer are fewer
than the size of the
message, so makes a call to async_read to get more, and the function end. At this point the
second
sizeReadCallbackHandler runs, sees junk and craps out. 


> Problem receiving messages from subscribed channels in c++ client 
> ------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-916
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-916
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: contrib-hedwig
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: ZOOKEEPER-916.patch
>
>
> We see this bug with receiving messages from a subscribed channel.  This problem seems
to happen with larger messages.  The flow is to first read at least 4 bytes from the socket
channel. Extract the first 4 bytes to get the message size.  If we've read enough data into
the buffer already, we're done so invoke the messageReadCallbackHandler passing the channel
and message size.  If not, then do an async read for at least the remaining amount of bytes
in the message from the socket channel.  When done, invoke the messageReadCallbackHandler.
> The problem seems that when the second async read is done, the same sizeReadCallbackHandler
is invoked instead of the messageReadCallbackHandler.  The result is that we then try to read
the first 4 bytes again from the buffer.  This will get a random message size and screw things
up.  I'm not sure if it's an incorrect use of the boost asio async_read function or we're
doing the boost bind to the callback function incorrectly.
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler
system:0,512 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler:
size of buffer before reading message size: 512 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler:
size of incoming message 599, currently in buffer 508 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler:
Still have more data to read, 91 from channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler
system:0, 91 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler:
size of buffer before reading message size: 599 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler:
size of incoming message 134287360, currently in buffer 595 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler:
Still have more data to read, 134286765 from channel(0x80b7a18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message