incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: Run Twitter Trending Example on Multi Machines
Date Thu, 20 Sep 2012 09:25:56 GMT
Hi,

as far as I can tell from the logs, the local host name of the node is 
not resolved correctly: it is resolved as "localhost" instead of the 
fully qualified host name.

S4 currently uses the following method to resolve the local host name:
InetAddress.getLocalHost().getCanonicalHostName()

Note that getLocalHost() will return the loopback address if you have a 
security manager that doesn't allow to resolve the localhost. Otherwise 
there might be something wrong with you /etc/hosts file (or equivalent).

Hope this helps,

Matthieu


On 9/20/12 4:49 AM, Frank Zheng wrote:
> Hi,
>
> I ran the Twitter Trending Example on two machines to test the S4
> Fail-over Mechanism.
> Firstly I set up five ZooKeeper servers, three on machine1 and two on
> machine2.
> Then I set up two PE nodes and one adapter node on machine2. Afterwards
> I set up three standby nodes on machine1, two for PE and one for adapter.
> When I shut down one PE node on machine2, the ZooKeeper distributed
> tasks to one standby node for PE on machine1. But that node got tasks
> and did not work correctly.
>
> Standby PE Node on machine1
>
> 10:35:57.742 [main] INFO  org.apache.s4.core.Main - Initializing S4 node
> with :
> - comm module class [org.apache.s4.comm.DefaultCommModule]
> - comm configuration file [default.s4.comm.properties from classpath]
> - core module class [org.apache.s4.core.DefaultCoreModule]
> - core configuration file[default.s4.core.properties from classpath]
> - extra modules: []
> - inline parameters: []
> 10:35:57.752 [main] DEBUG org.apache.s4.core.Main - Adding named
> parameters for injection : [s4.cluster.zk_address=testing.machine1:2182]
> 10:35:58.073 [main] INFO  org.apache.s4.core.Main - Starting S4 node.
> This node will automatically download applications published for the
> cluster it belongs to
> 10:35:58.175 [main] INFO  o.a.s.comm.topology.AssignmentFromZK - New
> session:88349612453724188; state is : SyncConnected
> 10:35:58.185 [main] INFO  o.a.s.comm.topology.AssignmentFromZK - Could
> not acquire task. Going into standby mode
> 10:35:58.254 [main] INFO  org.apache.s4.core.Server - Loading
> application [twitter-counter] from file [/tmp/tmp3353582636362855640s4r]
> 10:35:58.255 [main] WARN  o.a.s4.base.util.S4RLoaderFactory - s4.tmp.dir
> not specified, using temporary directory [/tmp/1348108558254-0] for
> unpacking S4R. You may want to specify a parent non-temporary directory.
> 10:35:58.255 [main] INFO  o.a.s4.base.util.S4RLoaderFactory - Unzipping
> S4R archive in [/tmp/1348108558254-0]
> 10:35:58.351 [main] INFO  org.apache.s4.core.Server - App class name is:
> org.apache.s4.example.twitter.TwitterCounterApp
> 10:35:58.423 [main] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing
> cluster topology to {
> nbNodes=2,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=localhost,taskId=Task-0},
> {partition=1,port=12001,machineName=localhost,taskId=Task-1}]} from null
> 10:35:58.458 [main] INFO  o.a.s4.comm.topology.ClusterFromZK - Adding
> topology change listener:org.apache.s4.comm.tcp.TCPEmitter@e4c6320
>
>
> When one working PE node failed on machine2, the standby PE node had
> logs as follows
>
> 10:39:24.047 [ZkClient-EventThread-19-testing.machine1:2182] INFO
> o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to {
> nbNodes=1,name=cluster1,mode=unicast,type=,nodes=[{partition=1,port=12001,machineName=localhost,taskId=Task-1}]}
> from {
> nbNodes=2,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=localhost,taskId=Task-0},
> {partition=1,port=12001,machineName=localhost,taskId=Task-1}]}
> 10:39:24.102 [ZkClient-EventThread-16-testing.machine1:2182] INFO
> o.a.s.comm.topology.AssignmentFromZK - Successfully acquired task:Task-0
> by localhost
> 10:39:24.116 [ZkClient-EventThread-19-testing.machine1:2182] INFO
> o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to {
> nbNodes=2,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=localhost,taskId=Task-0},
> {partition=1,port=12001,machineName=localhost,taskId=Task-1}]} from {
> nbNodes=1,name=cluster1,mode=unicast,type=,nodes=[{partition=1,port=12001,machineName=localhost,taskId=Task-1}]}
> 10:39:24.159 [main] INFO  o.a.s4.comm.topology.ClustersFromZK - New
> session:88349612453724194
> 10:39:24.162 [main] INFO  o.a.s4.comm.topology.ClustersFromZK - Detected
> new stream [RawStatus]
> 10:39:24.193 [main] INFO  o.a.s4.comm.topology.ClustersFromZK - New
> session:88349612453724195
> 10:39:24.205 [main] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing
> cluster topology to {
> nbNodes=2,name=cluster1,mode=unicast,type=,nodes=[{partition=0,port=12000,machineName=localhost,taskId=Task-0},
> {partition=1,port=12001,machineName=localhost,taskId=Task-1}]} from null
> 10:39:24.212 [main] INFO  o.a.s4.comm.topology.ClusterFromZK - Changing
> cluster topology to {
> nbNodes=1,name=cluster2,mode=unicast,type=,nodes=[{partition=0,port=13000,machineName=localhost,taskId=Task-0}]}
> from null
> 10:39:24.213 [main] INFO  org.apache.s4.core.Server - Loaded application
> from file /tmp/tmp2695149871633020370s4r
> 10:39:24.213 [main] INFO  o.a.s.d.DistributedDeploymentManager -
> Successfully installed application twitter-counter
> 10:39:24.231 [main] DEBUG o.a.s.c.g.OverloadDispatcherGenerator -
> Dumping generated overload dispatcher class for PE of class [class
> org.apache.s4.example.twitter.TopNTopicPE]
> 10:39:24.249 [main] INFO  o.a.s4.example.twitter.TopNTopicPE - key: []
> 10:39:24.254 [main] DEBUG o.a.s.c.g.OverloadDispatcherGenerator -
> Dumping generated overload dispatcher class for PE of class [class
> org.apache.s4.example.twitter.TopicCountAndReportPE]
> 10:39:24.256 [main] DEBUG o.a.s.c.g.OverloadDispatcherGenerator -
> Dumping generated overload dispatcher class for PE of class [class
> org.apache.s4.example.twitter.TopicExtractorPE]
> 10:39:24.256 [main] DEBUG o.a.s4.comm.topology.ClustersFromZK - Adding
> input stream [RawStatus] for app [-1] in cluster [cluster1]
> 10:39:24.332 [main] INFO  org.apache.s4.core.App - Init prototype
> [org.apache.s4.example.twitter.TopNTopicPE].
> 10:39:24.334 [main] DEBUG org.apache.s4.core.ProcessingElement - Started
> timer for PE prototype [org.apache.s4.example.twitter.TopNTopicPE], ID
> [] with interval [10000].
> 10:39:24.335 [main] DEBUG org.apache.s4.core.ProcessingElement - Started
> checkpointing timer for PE prototype
> [org.apache.s4.example.twitter.TopNTopicPE], ID [] with interval [20]
> [SECONDS].
> 10:39:24.335 [main] INFO  org.apache.s4.core.App - Init prototype
> [org.apache.s4.example.twitter.TopicCountAndReportPE].
> 10:39:24.336 [main] DEBUG org.apache.s4.core.ProcessingElement - Started
> timer for PE prototype
> [org.apache.s4.example.twitter.TopicCountAndReportPE], ID [] with
> interval [10000].
> 10:39:24.336 [main] INFO  org.apache.s4.core.App - Init prototype
> [org.apache.s4.example.twitter.TopicExtractorPE].
>
>
> This node halted here and did not work, until the adapter node on
> machine2 failed and the standby node for adapter on machine1 worked.
> Then the halting PE nodes on machine1 worked correctly, but the working
> PE nodes on machine2 stopped and had logs as follows.
>
> 10:43:44.064 [ZkClient-EventThread-27-testing.machine1:2182] INFO
> o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to {
> nbNodes=0,name=unknown,mode=unicast,type=,nodes=[]} from {
> nbNodes=1,name=cluster2,mode=unicast,type=,nodes=[{partition=0,port=13000,machineName=localhost,taskId=Task-0}]}
> 10:43:44.113 [ZkClient-EventThread-27-testing.machine1:2182] INFO
> o.a.s4.comm.topology.ClusterFromZK - Changing cluster topology to {
> nbNodes=1,name=cluster2,mode=unicast,type=,nodes=[{partition=0,port=13000,machineName=localhost,taskId=Task-0}]}
> from { nbNodes=0,name=unknown,mode=unicast,type=,nodes=[]}
>
>
> Does this mean that the PE nodes and adapter node should locate on the
> same machine?
> It seems that local PE nodes can not communicate with adapter node on
> the remote machine.
>
> Sincerely,
> Yu Zheng
>
>
>


Mime
View raw message