incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Zheng <bearzheng2...@gmail.com>
Subject Re: Run Twitter Trending Example on Multi Machines
Date Fri, 21 Sep 2012 02:40:22 GMT
Hi Matthieu,

I reconfigured the /etc/hosts file, matching real IP address with local
machine name, instead of 127.0.0.1.
Then it worked!
Thank you so much.

Now here comes another problem.
I followed the steps of Run Twitter Trending Example. And I set up two
newCluster on the same server testing.machine1:2182.
Then I set up two nodes of cluster1 and one node of cluster2 on the same
server testing.machine1:2182.

When I deployed twitter-counter app on cluster1, there was no problem
When I deployed twitter-adapter app on cluster2, it did not work.

[root@testing apache-s4-0.5.0-incubating-src]# ./s4 deploy
-s4r=/usr/apache-s4-0.5.0-incubating-src/test-apps/twitter-adapter/build/libs/twitter-adapter.s4r
-c=cluster2 -appName=twitter-adapter -zk=testing.machine1:2182
10:31:27.178 [main] ERROR org.apache.s4.tools.Deploy - Cannot deploy app
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to
zookeeper server within timeout: 10000
    at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:876)
~[zkclient-0.1.jar:na]
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
~[zkclient-0.1.jar:na]
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:92)
~[zkclient-0.1.jar:na]
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:76)
~[zkclient-0.1.jar:na]
    at org.apache.s4.tools.Deploy.main(Deploy.java:59)
~[s4-tools-0.5.0-incubating.jar:0.5.0-incubating]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[na:1.6.0_22]
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
~[na:1.6.0_22]
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.6.0_22]
    at java.lang.reflect.Method.invoke(Method.java:616) ~[na:1.6.0_22]
    at org.apache.s4.tools.Tools$Task.dispatch(Tools.java:54)
[s4-tools-0.5.0-incubating.jar:0.5.0-incubating]
    at org.apache.s4.tools.Tools.main(Tools.java:94)
[s4-tools-0.5.0-incubating.jar:0.5.0-incubating]


Then I deployed twitter-adapter app on another server on machine1,
machine1:2183. It worked.

[root@testing apache-s4-0.5.0-incubating-src]# ./s4 deploy
-s4r=/usr/apache-s4-0.5.0-incubating-src/test-apps/twitter-adapter/build/libs/twitter-adapter.s4r
-c=cluster2 -appName=twitter-adapter -zk=testing.machine1:2183
10:33:00.830 [main] INFO  org.apache.s4.tools.Deploy - Using specified S4R
[/usr/apache-s4-0.5.0-incubating-src/test-apps/twitter-adapter/build/libs/twitter-adapter.s4r],
the S4R archive will not be built from source (and corresponding parameters
are ignored)
10:33:00.911 [main] INFO  org.apache.s4.tools.Deploy - uploaded application
[twitter-adapter] to cluster [cluster2], using zookeeper znode
[/s4/clusters/cluster2/app/twitter-adapter], and s4r file
[/usr/apache-s4-0.5.0-incubating-src/test-apps/twitter-adapter/build/libs/twitter-adapter.s4r]


Then I checked the logs of PE node, the error is as follows.

[root@testing apache-s4-0.5.0-incubating-src]# ./s4 node -c=cluster1
-zk=testing.machine1:2182
10:36:05.165 [main] INFO  org.apache.s4.core.Main - Initializing S4 node
with :
- comm module class [org.apache.s4.comm.DefaultCommModule]
- comm configuration file [default.s4.comm.properties from classpath]
- core module class [org.apache.s4.core.DefaultCoreModule]
- core configuration file[default.s4.core.properties from classpath]
- extra modules: []
- inline parameters: []
10:36:05.175 [main] DEBUG org.apache.s4.core.Main - Adding named parameters
for injection : [s4.cluster.zk_address=testing.machine1:2182]
10:36:05.525 [main] INFO  org.apache.s4.core.Main - Starting S4 node. This
node will automatically download applications published for the cluster it
belongs to
10:36:16.745 [main] ERROR org.apache.s4.core.Main - Cannot start S4 node
com.google.inject.ProvisionException: Guice provision errors:

1) Error injecting constructor,
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to
zookeeper server within timeout: 10000
  at org.apache.s4.core.Server.<init>(Server.java:71)
  while locating org.apache.s4.core.Server

1 error
    at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
~[guice-3.0.jar:na]
    at
com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)
~[guice-3.0.jar:na]
    at org.apache.s4.core.Main.startNode(Main.java:148)
[s4-core-0.5.0-incubating.jar:0.5.0-incubating]
    at org.apache.s4.core.Main.main(Main.java:75)
[s4-core-0.5.0-incubating.jar:0.5.0-incubating]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[na:1.6.0_22]
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
~[na:1.6.0_22]
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.6.0_22]
    at java.lang.reflect.Method.invoke(Method.java:616) ~[na:1.6.0_22]
    at org.apache.s4.tools.Tools$Task.dispatch(Tools.java:54)
[s4-tools-0.5.0-incubating.jar:0.5.0-incubating]
    at org.apache.s4.tools.Tools.main(Tools.java:94)
[s4-tools-0.5.0-incubating.jar:0.5.0-incubating]
Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to
connect to zookeeper server within timeout: 10000
    at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:876)
~[zkclient-0.1.jar:na]
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
~[zkclient-0.1.jar:na]
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:92)
~[zkclient-0.1.jar:na]
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:80)
~[zkclient-0.1.jar:na]
    at org.apache.s4.core.Server.<init>(Server.java:74)
~[s4-core-0.5.0-incubating.jar:0.5.0-incubating]
    at
org.apache.s4.core.Server$$FastClassByGuice$$69e0fd5b.newInstance(<generated>)
~[guice-3.0.jar:0.5.0-incubating]
    at
com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
~[guice-3.0.jar:na]
    at
com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
~[guice-3.0.jar:na]
    at
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
~[guice-3.0.jar:na]
    at
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
~[guice-3.0.jar:na]
    at
com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
~[guice-3.0.jar:na]
    at
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
~[guice-3.0.jar:na]
    at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
~[guice-3.0.jar:na]
    ... 9 common frames omitted



Looking forward to your reply.
Thanks.

Sincerely,
Yu



On Thu, Sep 20, 2012 at 5:25 PM, Matthieu Morel <mmorel@apache.org> wrote:

> Hi,
>
> as far as I can tell from the logs, the local host name of the node is not
> resolved correctly: it is resolved as "localhost" instead of the fully
> qualified host name.
>
> S4 currently uses the following method to resolve the local host name:
> InetAddress.getLocalHost().**getCanonicalHostName()
>
> Note that getLocalHost() will return the loopback address if you have a
> security manager that doesn't allow to resolve the localhost. Otherwise
> there might be something wrong with you /etc/hosts file (or equivalent).
>
> Hope this helps,
>
> Matthieu
>
>
>
> On 9/20/12 4:49 AM, Frank Zheng wrote:
>
>> Hi,
>>
>> I ran the Twitter Trending Example on two machines to test the S4
>> Fail-over Mechanism.
>> Firstly I set up five ZooKeeper servers, three on machine1 and two on
>> machine2.
>> Then I set up two PE nodes and one adapter node on machine2. Afterwards
>> I set up three standby nodes on machine1, two for PE and one for adapter.
>> When I shut down one PE node on machine2, the ZooKeeper distributed
>> tasks to one standby node for PE on machine1. But that node got tasks
>> and did not work correctly.
>>
>> Standby PE Node on machine1
>>
>> 10:35:57.742 [main] INFO  org.apache.s4.core.Main - Initializing S4 node
>> with :
>> - comm module class [org.apache.s4.comm.**DefaultCommModule]
>> - comm configuration file [default.s4.comm.properties from classpath]
>> - core module class [org.apache.s4.core.**DefaultCoreModule]
>> - core configuration file[default.s4.core.**properties from classpath]
>> - extra modules: []
>> - inline parameters: []
>> 10:35:57.752 [main] DEBUG org.apache.s4.core.Main - Adding named
>> parameters for injection : [s4.cluster.zk_address=**
>> testing.machine1:2182]
>> 10:35:58.073 [main] INFO  org.apache.s4.core.Main - Starting S4 node.
>> This node will automatically download applications published for the
>> cluster it belongs to
>> 10:35:58.175 [main] INFO  o.a.s.comm.topology.**AssignmentFromZK - New
>> session:88349612453724188; state is : SyncConnected
>> 10:35:58.185 [main] INFO  o.a.s.comm.topology.**AssignmentFromZK - Could
>> not acquire task. Going into standby mode
>> 10:35:58.254 [main] INFO  org.apache.s4.core.Server - Loading
>> application [twitter-counter] from file [/tmp/**
>> tmp3353582636362855640s4r]
>> 10:35:58.255 [main] WARN  o.a.s4.base.util.**S4RLoaderFactory -
>> s4.tmp.dir
>> not specified, using temporary directory [/tmp/1348108558254-0] for
>> unpacking S4R. You may want to specify a parent non-temporary directory.
>> 10:35:58.255 [main] INFO  o.a.s4.base.util.**S4RLoaderFactory - Unzipping
>> S4R archive in [/tmp/1348108558254-0]
>> 10:35:58.351 [main] INFO  org.apache.s4.core.Server - App class name is:
>> org.apache.s4.example.twitter.**TwitterCounterApp
>> 10:35:58.423 [main] INFO  o.a.s4.comm.topology.**ClusterFromZK - Changing
>> cluster topology to {
>> nbNodes=2,name=cluster1,mode=**unicast,type=,nodes=[{**
>> partition=0,port=12000,**machineName=localhost,taskId=**Task-0},
>> {partition=1,port=12001,**machineName=localhost,taskId=**Task-1}]} from
>> null
>> 10:35:58.458 [main] INFO  o.a.s4.comm.topology.**ClusterFromZK - Adding
>> topology change listener:org.apache.s4.comm.**tcp.TCPEmitter@e4c6320
>>
>>
>> When one working PE node failed on machine2, the standby PE node had
>> logs as follows
>>
>> 10:39:24.047 [ZkClient-EventThread-19-**testing.machine1:2182] INFO
>> o.a.s4.comm.topology.**ClusterFromZK - Changing cluster topology to {
>> nbNodes=1,name=cluster1,mode=**unicast,type=,nodes=[{**
>> partition=1,port=12001,**machineName=localhost,taskId=**Task-1}]}
>> from {
>> nbNodes=2,name=cluster1,mode=**unicast,type=,nodes=[{**
>> partition=0,port=12000,**machineName=localhost,taskId=**Task-0},
>> {partition=1,port=12001,**machineName=localhost,taskId=**Task-1}]}
>> 10:39:24.102 [ZkClient-EventThread-16-**testing.machine1:2182] INFO
>> o.a.s.comm.topology.**AssignmentFromZK - Successfully acquired
>> task:Task-0
>> by localhost
>> 10:39:24.116 [ZkClient-EventThread-19-**testing.machine1:2182] INFO
>> o.a.s4.comm.topology.**ClusterFromZK - Changing cluster topology to {
>> nbNodes=2,name=cluster1,mode=**unicast,type=,nodes=[{**
>> partition=0,port=12000,**machineName=localhost,taskId=**Task-0},
>> {partition=1,port=12001,**machineName=localhost,taskId=**Task-1}]} from {
>> nbNodes=1,name=cluster1,mode=**unicast,type=,nodes=[{**
>> partition=1,port=12001,**machineName=localhost,taskId=**Task-1}]}
>> 10:39:24.159 [main] INFO  o.a.s4.comm.topology.**ClustersFromZK - New
>> session:88349612453724194
>> 10:39:24.162 [main] INFO  o.a.s4.comm.topology.**ClustersFromZK -
>> Detected
>> new stream [RawStatus]
>> 10:39:24.193 [main] INFO  o.a.s4.comm.topology.**ClustersFromZK - New
>> session:88349612453724195
>> 10:39:24.205 [main] INFO  o.a.s4.comm.topology.**ClusterFromZK - Changing
>> cluster topology to {
>> nbNodes=2,name=cluster1,mode=**unicast,type=,nodes=[{**
>> partition=0,port=12000,**machineName=localhost,taskId=**Task-0},
>> {partition=1,port=12001,**machineName=localhost,taskId=**Task-1}]} from
>> null
>> 10:39:24.212 [main] INFO  o.a.s4.comm.topology.**ClusterFromZK - Changing
>> cluster topology to {
>> nbNodes=1,name=cluster2,mode=**unicast,type=,nodes=[{**
>> partition=0,port=13000,**machineName=localhost,taskId=**Task-0}]}
>> from null
>> 10:39:24.213 [main] INFO  org.apache.s4.core.Server - Loaded application
>> from file /tmp/tmp2695149871633020370s4r
>> 10:39:24.213 [main] INFO  o.a.s.d.**DistributedDeploymentManager -
>> Successfully installed application twitter-counter
>> 10:39:24.231 [main] DEBUG o.a.s.c.g.**OverloadDispatcherGenerator -
>> Dumping generated overload dispatcher class for PE of class [class
>> org.apache.s4.example.twitter.**TopNTopicPE]
>> 10:39:24.249 [main] INFO  o.a.s4.example.twitter.**TopNTopicPE - key: []
>> 10:39:24.254 [main] DEBUG o.a.s.c.g.**OverloadDispatcherGenerator -
>> Dumping generated overload dispatcher class for PE of class [class
>> org.apache.s4.example.twitter.**TopicCountAndReportPE]
>> 10:39:24.256 [main] DEBUG o.a.s.c.g.**OverloadDispatcherGenerator -
>> Dumping generated overload dispatcher class for PE of class [class
>> org.apache.s4.example.twitter.**TopicExtractorPE]
>> 10:39:24.256 [main] DEBUG o.a.s4.comm.topology.**ClustersFromZK - Adding
>> input stream [RawStatus] for app [-1] in cluster [cluster1]
>> 10:39:24.332 [main] INFO  org.apache.s4.core.App - Init prototype
>> [org.apache.s4.example.**twitter.TopNTopicPE].
>> 10:39:24.334 [main] DEBUG org.apache.s4.core.**ProcessingElement -
>> Started
>> timer for PE prototype [org.apache.s4.example.**twitter.TopNTopicPE], ID
>> [] with interval [10000].
>> 10:39:24.335 [main] DEBUG org.apache.s4.core.**ProcessingElement -
>> Started
>> checkpointing timer for PE prototype
>> [org.apache.s4.example.**twitter.TopNTopicPE], ID [] with interval [20]
>> [SECONDS].
>> 10:39:24.335 [main] INFO  org.apache.s4.core.App - Init prototype
>> [org.apache.s4.example.**twitter.TopicCountAndReportPE]**.
>> 10:39:24.336 [main] DEBUG org.apache.s4.core.**ProcessingElement -
>> Started
>> timer for PE prototype
>> [org.apache.s4.example.**twitter.TopicCountAndReportPE]**, ID [] with
>> interval [10000].
>> 10:39:24.336 [main] INFO  org.apache.s4.core.App - Init prototype
>> [org.apache.s4.example.**twitter.TopicExtractorPE].
>>
>>
>> This node halted here and did not work, until the adapter node on
>> machine2 failed and the standby node for adapter on machine1 worked.
>> Then the halting PE nodes on machine1 worked correctly, but the working
>> PE nodes on machine2 stopped and had logs as follows.
>>
>> 10:43:44.064 [ZkClient-EventThread-27-**testing.machine1:2182] INFO
>> o.a.s4.comm.topology.**ClusterFromZK - Changing cluster topology to {
>> nbNodes=0,name=unknown,mode=**unicast,type=,nodes=[]} from {
>> nbNodes=1,name=cluster2,mode=**unicast,type=,nodes=[{**
>> partition=0,port=13000,**machineName=localhost,taskId=**Task-0}]}
>> 10:43:44.113 [ZkClient-EventThread-27-**testing.machine1:2182] INFO
>> o.a.s4.comm.topology.**ClusterFromZK - Changing cluster topology to {
>> nbNodes=1,name=cluster2,mode=**unicast,type=,nodes=[{**
>> partition=0,port=13000,**machineName=localhost,taskId=**Task-0}]}
>> from { nbNodes=0,name=unknown,mode=**unicast,type=,nodes=[]}
>>
>>
>> Does this mean that the PE nodes and adapter node should locate on the
>> same machine?
>> It seems that local PE nodes can not communicate with adapter node on
>> the remote machine.
>>
>> Sincerely,
>> Yu Zheng
>>
>>
>>
>>
>


-- 
Sincerely,
Zheng Yu
Mobile:  (852) 60670059
Email:    bearzheng2011@gmail.com

Mime
View raw message