Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFBDD171D6 for ; Wed, 1 Oct 2014 16:10:34 +0000 (UTC) Received: (qmail 9887 invoked by uid 500); 1 Oct 2014 16:10:34 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 9737 invoked by uid 500); 1 Oct 2014 16:10:34 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 9511 invoked by uid 99); 1 Oct 2014 16:10:33 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 16:10:33 +0000 Date: Wed, 1 Oct 2014 16:10:33 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-12141) ClusterStatus should frame protobuf payloads MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Andrew Purtell created HBASE-12141: -------------------------------------- Summary: ClusterStatus should frame protobuf payloads Key: HBASE-12141 URL: https://issues.apache.org/jira/browse/HBASE-12141 Project: HBase Issue Type: Bug Reporter: Andrew Purtell The multicast ClusterStatusPublisher and its companion listener are using datagram channels without any framing. Netty's ProtobufDecoder expects a complete PB message to be available in the ChannelBuffer. As one user reported on list: {noformat} org.apache.hadoop.hbase.client.ClusterStatusListener - ERROR - Unexpected exception, continuing. com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type. at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99) at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498) at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193) at org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus.(ClusterStatusProtos.java:7554) at org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus.(ClusterStatusProtos.java:7512) at org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus$1.parsePartialFrom(ClusterStatusProtos.java:7689) at org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus$1.parsePartialFrom(ClusterStatusProtos.java:7684) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:182) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.jboss.netty.handler.codec.protobuf.ProtobufDecoder.decode(ProtobufDecoder.java:122) at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.socket.oio.OioDatagramWorker.process(OioDatagramWorker.java:52) at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} The javadoc for ProtobufDecoder says: {quote} Decodes a received ChannelBuffer into a Google Protocol Buffers Message and MessageLite. Please note that this decoder must be used with a proper FrameDecoder such as ProtobufVarint32FrameDecoder or LengthFieldBasedFrameDecoder if you are using a stream-based transport such as TCP/IP. {quote} and even though we are using a datagram transport we have related issues, depending on what the sending and receiving OS does with overly large datagrams: - We may receive a datagram with a truncated message - We may get an upcall when processing one fragment of a fragmented datagram, where the complete message is not available yet - We may not be able to send the overly large ClusterStatus in the first place. Linux claims to do PMTU and return EMSGSIZE if a datagram packet payload exceeds the MTU, but will send a fragmented datagram if PMTU is disabled. I'm surprised we have the above report given the default is to reject overly large datagram payloads, so perhaps the user is using a different server OS or Netty datagram channels do their own fragmentation (I haven't checked). In any case, the server and client pipelines are definitely not doing any kind of framing. This is the multicast status listener from 0.98 for example: {code} b.setPipeline(Channels.pipeline( new ProtobufDecoder(ClusterStatusProtos.ClusterStatus.getDefaultInstance()), new ClusterStatusHandler())); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)