incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-37) Implement Netty-backed rpc solution
Date Sat, 15 Oct 2011 01:08:11 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jakob Homan updated GIRAPH-37:
------------------------------

    Attachment: GIRAPH-37-wip.patch

Here's a work in progress patch for review and because I have to take next week to work on
something else, so wanted to get it out before it went stale.  It uses Finagle with Thrift.
 This experience was at first challenging due to Finagle ramp-up costs, then nice, now a challenging
again due to stability issues.  95% of the size of the patch is generated thrift code; I'm
not usually a fan on including generated code, but as explained below, this is a reasonable
approach for Finagle.

The good:
* With this patch I can scale up to about 1k workers, although not reliably (see bad points)
* This approach moves us away from Hadoop RPC, which is good for the upcoming Yarn work and
because Hadoop RPC itself is not ideal.
* Looking at what Hyunsik was having to go through when he was looking at Netty+PB, Finagle
definitely saves quite a lot of work.
* This exercise has identified several improvements to the overall that need to be done. 
I've opened GIRAPH-57, GIRAPH-55 and GIRAPH-54 for these.

The bad:
* The Thrift-Finagle combination uses a forked version of the thrift compiler to generate
the interface Finagle expects.  Once up and running this is fine, but it means that we'd be
dependent on this oddity.  Also, we'd need to include the generated code since it's too much
to ask regular developers (not interested in the rpc) to download a new thrift compiler from
github, compile it, keep it around, etc.
* There are quite a lot of knobs necessary to get a reliable run with a large number of mappers.
 This is partially a fact of life of a distributed rpc and we can probably determine some
of them programmatically, but at the moment, I can only get successful runs about 2/3 of the
time.  The rest I get very difficult to decipher stack traces such as:
{noformat}
WARNING: An exception was thrown by a user handler while handling an exception event ([id:
0x4b7f1841, /172.18.67.79:46082 :> esv4-hcl227.corp.linkedin.com/172.18.66.182:30047] EXCEPTION:
com.twitter.util.Promise$ImmutableResult: Result set multiple times: Throw(java.lang.RuntimeException:
Hit exception in proxied call))
java.lang.RuntimeException: Hit exception in proxied call
	at org.apache.giraph.comm.finaglerpc.ThriftRPCProxyClient$CDLListener.onFailure(ThriftRPCProxyClient.java:91)
	at com.twitter.util.Future$$anonfun$addEventListener$1.apply(Future.scala:277)
	at com.twitter.util.Future$$anonfun$addEventListener$1.apply(Future.scala:276)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.util.Promise$$anon$2$$anonfun$8.apply(Future.scala:506)
	at com.twitter.util.Promise$$anon$2$$anonfun$8.apply(Future.scala:497)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.finagle.service.RetryingFilter$$anonfun$1.apply(RetryingFilter.scala:73)
	at com.twitter.finagle.service.RetryingFilter$$anonfun$1.apply(RetryingFilter.scala:56)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.util.Promise$$anon$2$$anonfun$8$$anonfun$apply$7.apply(Future.scala:502)
	at com.twitter.util.Promise$$anon$2$$anonfun$8$$anonfun$apply$7.apply(Future.scala:502)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.util.Promise$$anon$1$$anonfun$7.apply(Future.scala:491)
	at com.twitter.util.Promise$$anon$1$$anonfun$7.apply(Future.scala:490)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.finagle.channel.ChannelService.com$twitter$finagle$channel$ChannelService$$reply(ChannelService.scala:51)
	at com.twitter.finagle.channel.ChannelService$$anon$1.exceptionCaught(ChannelService.scala:74)
	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238)
	at com.twitter.finagle.thrift.ThriftFrameCodec.handleUpstream(ThriftFrameCodec.scala:11)
	at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432)
	at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:52)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:76)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
	at com.twitter.finagle.thrift.ThriftFrameCodec.handleUpstream(ThriftFrameCodec.scala:11)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
	at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
{noformat}
another one that happens quite a lot is {{Caused by: com.twitter.finagle.UnknownChannelException:
com.twitter.util.Promise$ImmutableResult: Result set multiple times: Throw(java.lang.RuntimeException:
Hit exception in proxied call)}}.  I think I need some aid from someone more experienced with
Finagle, but I'm a bit nervous about the underlying framework being difficult to debug and
configure.

Currently the patch passes all unit tests (and needs more for the finagle section itself).
 Overall, I think the patch is worth pursuing and could be committed with the Hadoop RPC as
the default RPC and the config/stability issues resolved in follow-up patches.  Perhaps it's
just an issue of lousy configuration on my part.  Another option would be to look in a different
direction, such as MessagePack.

Thoughts?
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't
went in another direction. I think there is still value in this approach, and will also look
at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message