bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ming Chen <mc...@cs.stonybrook.edu>
Subject Re: Hedwig Across-Region Configuration
Date Fri, 16 Jan 2015 21:17:29 GMT
I found the problem is caused by my configuration. When set "regions"
in hw_server.conf,
the local region should NOT be included.

The problem went way after setting "regions=X.Y.Z.114:4080" for the first
region (reg1) and "regions=X.Y.Z.111:4080" for the second region (reg2).

Thanks,
Ming

On Tue, Jan 13, 2015 at 4:38 AM, Ivan Kelly <ivank@apache.org> wrote:

>    Hi Ming,
>
>  This looks like a bug. Feel free to dig in and try and fix it :)
>
>  The cross region stuff in hedwig was never tested extensively, so there's
> probably quite a few bugs in there.
>
>  Regards
>  Ivan
>
> On Mon, Jan 12, 2015 at 7:42 PM, Ming Chen <mchen@cs.stonybrook.edu>
> wrote:
>
>> FYI,  the cross-region communication is working now after I used the
>> latest code from git and enabled SSL in conf.
>>
>>  Even though there seems to be an infinite loop when I do "sub mytopic
>> myid1-1 2" in "hedwig console":
>>  [hedwig: (reg1) 164] sub mytopic myid1-1 2
>> SUB DONE AND RECEIVE
>> Finished 0.031 s.
>> [hedwig: (reg1) 165] Received message from topic mytopic for subscriber
>> myid1-1 : neeeeew-msg-from-reg2
>> Received message from topic mytopic for subscriber myid1-1 : mysg-1-2
>>  Received message from topic mytopic for subscriber myid1-1 :
>> abs-new-msg-from-reg1
>> Received message from topic mytopic for subscriber myid1-1 : mysg-1-2
>> Received message from topic mytopic for subscriber myid1-1 :
>> neeeeew-msg-from-reg2
>> Received message from topic mytopic for subscriber myid1-1 : msg-2-1
>>  Received message from topic mytopic for subscriber myid1-1 :
>> abs-new-msg-from-reg1
>> Received message from topic mytopic for subscriber myid1-1 : mysg-1-2
>> Received message from topic mytopic for subscriber myid1-1 :
>> neeeeew-msg-from-reg2
>>  ...
>>
>>  Thanks,
>> Ming
>>
>> On Thu, Jan 8, 2015 at 11:24 AM, Ming Chen <mchen@cs.stonybrook.edu>
>> wrote:
>>
>>>   Hi Ivan,
>>>
>>>  Thanks for the heads-up. Sorry that I didn't make it clear, but I did
>>> set the region option in hw_server.conf to "reg1" and "reg2" for the two
>>> regions, respectively.
>>>
>>>  I tried some more experiments, and got some error message with the
>>> following operations on just one region:
>>> (1) format
>>> (2) show topics # it throws an IOException, which is probably okay as we
>>> did not have any topic to show
>>> (3) pub mytopic1 hello-topic1
>>> (4) sub mytopic1 myid1 2
>>>
>>>  [hedwig: (reg1) 88] format
>>> You ask to format hedwig metadata stored in
>>> org.apache.hedwig.server.meta.ZkMetadataManagerFactory.
>>> Press <Return> to continue, or Q to cancel ...
>>> 2015-01-08 00:09:45,752 - INFO  - [main:HedwigAdmin@541] - Formatted
>>> Hedwig metadata successfully.
>>> 2015-01-08 00:09:45,757 - INFO  - [main:HedwigAdmin@544] - Removed old
>>> factory layout.
>>> 2015-01-08 00:09:45,770 - INFO  - [main:HedwigAdmin@548] - Created new
>>> factory layout.
>>> Formatted hedwig metadata successfully.
>>> Finished 2.352 s.
>>> [hedwig: (reg1) 89] show topics
>>> Unable to fetch the list of topics
>>> java.io.IOException: Failed to get topics list :
>>>         at
>>> org.apache.hedwig.server.meta.ZkMetadataManagerFactory.getTopics(ZkMetadataManagerFactory.java:98)
>>>         at
>>> org.apache.hedwig.admin.HedwigAdmin.getTopics(HedwigAdmin.java:331)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole$ShowCmd.showTopics(HedwigConsole.java:588)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole$ShowCmd.runCmd(HedwigConsole.java:564)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole.processCmd(HedwigConsole.java:966)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole.executeLine(HedwigConsole.java:937)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole.run(HedwigConsole.java:1021)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole.main(HedwigConsole.java:1036)
>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>> KeeperErrorCode = NoNode for /hedwig/reg1/topics
>>>         at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>>>         at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>         at
>>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
>>>         at
>>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
>>>         at
>>> org.apache.hedwig.server.meta.ZkMetadataManagerFactory.getTopics(ZkMetadataManagerFactory.java:96)
>>>         ... 7 more
>>> Finished 0.015 s.
>>> [hedwig: (reg1) 90] pub mytopic1 hello-topic1
>>> PUB DONE
>>> Finished 0.472 s.
>>> [hedwig: (reg1) 91] sub mytopic1 myid1 2
>>> 2015-01-08 00:13:38,021 - INFO  - [New I/O worker #6:HChannelHandler@228]
>>> - Channel [id: 0x50aa85e6, /127.0.0.1:52095 :> localhost/127.0.0.1:4080]
>>> was disconnected to host localhost/1
>>> 27.0.0.1:4080.
>>> 2015-01-08 00:13:38,022 - INFO  - [New I/O worker
>>> #6:AbstractHChannelManager@357] - NonSubscription Channel [id:
>>> 0x50aa85e6, /127.0.0.1:52095 :> localhost/127.0.0.1:4080] to localhost
>>> /127.0.0.1:4080 disconnected.
>>> 2015-01-08 00:13:38,030 - INFO  - [New I/O worker #7:HChannelHandler@228]
>>> - Channel [id: 0x9615a67b, /127.0.0.1:52098 :> localhost/127.0.0.1:4080]
>>> was disconnected to host localhost/1
>>> 27.0.0.1:4080.
>>> 2015-01-08 00:13:38,031 - INFO  - [New I/O worker
>>> #7:SimpleHChannelManager@191] - Subscription Channel [id: 0x9615a67b, /
>>> 127.0.0.1:52098 :> localhost/127.0.0.1:4080] disconnected from
>>>  localhost/127.0.0.1:4080.
>>> 2015-01-08 00:13:38,037 - ERROR - [main:HedwigSubscriber@130] -
>>> Unexpected PubSubException thrown:
>>> org.apache.hedwig.exceptions.PubSubException$UncertainStateException:
>>> Server ack response never received before server connection disconnected!
>>>         at
>>> org.apache.hedwig.client.netty.impl.HChannelHandler.channelDisconnected(HChannelHandler.java:252)
>>>         at
>>> org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:120)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>         at
>>> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>         at
>>> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
>>>         at
>>> org.jboss.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365)
>>>         at
>>> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>>>         at
>>> org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:396)
>>>         at
>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:360)
>>>         at
>>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>>>         at
>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>>>         at
>>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>>>         at
>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>>>         at
>>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>>         at
>>> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>>         at
>>> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> SUB FAILED
>>> org.apache.hedwig.exceptions.PubSubException$ServiceDownException:
>>> org.apache.hedwig.exceptions.PubSubException$UncertainStateException:
>>> Server ack response never received before server connection disconnected!
>>>         at
>>> org.apache.hedwig.client.netty.HedwigSubscriber.subUnsub(HedwigSubscriber.java:133)
>>>         at
>>> org.apache.hedwig.client.netty.HedwigSubscriber.subscribe(HedwigSubscriber.java:194)
>>>         at
>>> org.apache.hedwig.client.netty.HedwigSubscriber.subscribe(HedwigSubscriber.java:181)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole$SubCmd.runCmd(HedwigConsole.java:291)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole.processCmd(HedwigConsole.java:966)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole.executeLine(HedwigConsole.java:937)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole.run(HedwigConsole.java:1021)
>>>         at
>>> org.apache.hedwig.admin.console.HedwigConsole.main(HedwigConsole.java:1036)
>>> Caused by:
>>> org.apache.hedwig.exceptions.PubSubException$UncertainStateException:
>>> Server ack response never received before server connection disconnected!
>>>         at
>>> org.apache.hedwig.client.netty.impl.HChannelHandler.channelDisconnected(HChannelHandler.java:252)
>>>         at
>>> org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:120)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>         at
>>> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>         at
>>> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
>>>         at
>>> org.jboss.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365)
>>>         at
>>> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>         at
>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>>>         at
>>> org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:396)
>>>         at
>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:360)
>>>         at
>>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>>>         at
>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>>>         at
>>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>>>         at
>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>>>         at
>>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>>         at
>>> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>>         at
>>> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>>            at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>>  Thanks,
>>> Ming
>>>
>>>
>>> On Thu, Jan 8, 2015 at 6:05 AM, Ivan Kelly <ivank@apache.org> wrote:
>>>  > Hi Ming,
>>> >
>>> > It's been a long time since I looked at the region stuff in hedwig,
>>> but I
>>> > think it could be that you don't seem to be setting the region
>>> identifier in
>>> > hw_server.conf. You need to change "region" in hw_server to some
>>> identifier,
>>> > like reg1 and reg2 for your example.
>>> >
>>> > Hope this helps,
>>> > Ivan
>>> >
>>>
>>>
>>
>

Mime
View raw message