hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: S3/EC2 setup problem: port 9001 unreachable
Date Mon, 10 Mar 2008 17:50:40 GMT
Andreas

Here are some moderately useful notes on using EC2/S3, mostly learned  
leveraging Hadoop. The groups can't see themselves issue is listed  
<grin>.

http://www.manamplified.org/archives/2008/03/notes-on-using-ec2-s3.html

enjoy
ckw

On Mar 10, 2008, at 9:51 AM, Andreas Kostyrka wrote:

> Found it, was security group setup problem ;(
>
> Andreas
>
> Am Montag, den 10.03.2008, 16:49 +0100 schrieb Andreas Kostyrka:
>> Hi!
>>
>> I'm trying to setup a Hadoop 0.16.0 cluster on EC2/S3. (Manually, not
>> using the Hadoop AMIs)
>>
>> I've got the S3 based HDFS working, but I'm stumped when I try to  
>> get a
>> test job running:
>>
>> hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ time bin/hadoop jar  
>> contrib/streaming/hadoop-0.16.0-streaming.jar -mapper /tmp/test.sh - 
>> reducer cat -input testlogs/* -output testlogs-output
>> additionalConfSpec_:null
>> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
>> packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar17969/] [] /tmp/ 
>> streamjob17970.jar tmpDir=null
>> 08/03/10 14:01:28 INFO mapred.FileInputFormat: Total input paths to  
>> process : 152
>> 08/03/10 14:02:58 INFO streaming.StreamJob: getLocalDirs(): [/tmp/ 
>> hadoop-hadoop/mapred/local]
>> 08/03/10 14:02:58 INFO streaming.StreamJob: Running job:  
>> job_200803101400_0001
>> 08/03/10 14:02:58 INFO streaming.StreamJob: To kill this job, run:
>> 08/03/10 14:02:58 INFO streaming.StreamJob: /home/hadoop/ 
>> hadoop-0.16.0/bin/../bin/hadoop job  - 
>> Dmapred.job.tracker=ec2-67-202-58-97.compute-1.amazonaws.com:9001 - 
>> kill job_200803101400_0001
>> 08/03/10 14:02:58 INFO streaming.StreamJob: Tracking URL: http://ip-10-251-75-165.ec2.internal:50030/jobdetails.jsp?jobid=job_200803101400_0001
>> 08/03/10 14:02:59 INFO streaming.StreamJob:  map 0%  reduce 0%
>>
>> Furthermore, when I try to connect port 9001 on 10.251.75.165 via  
>> telnet from the masterhost itself, it connects:
>> hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ telnet 10.251.75.165 9001
>> Trying 10.251.75.165...
>> Connected to 10.251.75.165.
>> Escape character is '^]'.
>> ^]
>> telnet> quit
>> Connection closed.
>>
>> When I try to do this from other VMs in my cluster, it just hangs.
>> (tcpdump on the masterhost shows no activity for tcp port 9001):
>>
>> hadoop@ec2-67-202-37-210:~/hadoop-0.16.0$ telnet  
>> ip-10-251-75-165.ec2.internal 9001
>> Trying 10.251.75.165...
>>
>> hadoop@ec2-67-202-37-210:~/hadoop-0.16.0$ telnet  
>> ip-10-251-75-165.ec2.internal 22
>> Trying 10.251.75.165...
>> Connected to ip-10-251-75-165.ec2.internal.
>> Escape character is '^]'.
>> SSH-2.0-OpenSSH_4.3p2 Debian-9
>> ^]
>> telnet> quit
>> Connection closed.
>>
>> This is also shown when I connect port 50030, which shows 0 nodes  
>> ready to process the job.
>>
>> Furthermore, the slaves show the following messages:
>> 2008-03-10 15:30:11,455 INFO org.apache.hadoop.ipc.RPC: Problem  
>> connecting to server: ec2-67-202-58-97.compute-1.amazonaws.com/ 
>> 10.251.75.165:9001
>> 2008-03-10 15:31:12,465 INFO org.apache.hadoop.ipc.Client: Retrying  
>> connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/ 
>> 10.251.75.165:9001. Already tried 1 time(s).
>> 2008-03-10 15:32:13,475 INFO org.apache.hadoop.ipc.Client: Retrying  
>> connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/ 
>> 10.251.75.165:9001. Already tried 2 time(s).
>>
>> Last but not least, here is my site conf:
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> <configuration>
>>
>> <property>
>>  <name>fs.default.name</name>
>>  <value>s3://lookhad</value>
>>  <description>The name of the default file system.  A URI whose
>>  scheme and authority determine the FileSystem implementation.  The
>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>  the FileSystem implementation class.  The uri's authority is used to
>>  determine the host, port, etc. for a filesystem.</description>
>> </property>
>>
>> <property>
>>  <name>fs.s3.awsAccessKeyId</name>
>>  <value>2DFGTTFSDFDSZU5SDSD7S5202</value>
>> </property>
>>
>> <property>
>>  <name>fs.s3.awsSecretAccessKey</name>
>>  <value>RUWgsdfsd67SFDfsdflaI9Gjzfsd8789ksd2r1PtG</value>
>> </property>
>>
>> <property>
>>  <name>mapred.job.tracker</name>
>>  <value>ec2-67-202-58-97.compute-1.amazonaws.com:9001</value>
>>  <description>The host and port that the MapReduce job tracker runs
>>  at.  If "local", then jobs are run in-process as a single map
>>  and reduce task.
>>  </description>
>> </property>
>> </configuration>
>>
>> The masternode listens not no localhost:
>> hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ netstat -an | grep 9001
>> tcp        0      0 10.251.75.165:9001      0.0.0.0:*                
>> LISTEN
>>
>> Any ideas? My conclusions thus are:
>>
>> 1.) First, it's not a general connectivity problem, because I can  
>> connect port 22 without any problems.
>> 2.) OTOH, on port 9001, inside the same group, the connectivity  
>> seems to be limited.
>> 3.) All AWS docs tell me that VMs in one group have no firewalls in  
>> place.
>>
>> So what is happening here? Any ideas?
>>
>> Andreas

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/




Mime
View raw message