giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Inci Cetindil <icetin...@gmail.com>
Subject Re: PageRankBenchmrk fails due to IllegalStateException
Date Fri, 02 Dec 2011 07:04:07 GMT
I have tried it with various numbers of workers and it only worked with 1 worker.

I am not running multiple Giraph jobs at the same time, does it always use the ports 30000
and up? I checked the used ports using "netstat" command and didn't see any of the ports 30000-30005.

Inci

On Dec 1, 2011, at 7:03 PM, Avery Ching wrote:

> Hmmm...this is unusual.  I wonder if it is tired to the weird number of tasks you are
getting.  Can you try it with various numbers of workers (i.e. 1, 2) and see if it works?
> 
> To me, the connection refused error indicates that perhaps the server failed to bind
to its port (are you running multiple Giraph jobs at the same time) or the server died?
> 
> Avery
> 
> On 12/1/11 5:33 PM, Inci Cetindil wrote:
>> I am sure the machines can communicate to each other and the ports are not blocked.
I can run word count hadoop job without any problem on these machines. My hadoop version is
0.20.203.0.
>> 
>> Thanks,
>> Inci
>> 
>> On Dec 1, 2011, at 3:57 PM, Avery Ching wrote:
>> 
>>> Thanks for the logs.  I see a lot of issues like the following:
>>> 
>>> 2011-12-01 00:04:46,241 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 0 time(s).
>>> 2011-12-01 00:04:47,243 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 1 time(s).
>>> 2011-12-01 00:04:48,245 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 2 time(s).
>>> 2011-12-01 00:04:49,247 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 3 time(s).
>>> 2011-12-01 00:04:50,249 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 4 time(s).
>>> 2011-12-01 00:04:51,251 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 5 time(s).
>>> 2011-12-01 00:04:52,253 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 6 time(s).
>>> 2011-12-01 00:04:53,255 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 7 time(s).
>>> 2011-12-01 00:04:54,256 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 8 time(s).
>>> 2011-12-01 00:04:55,258 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: rainbow-01/192.168.100.1:30004. Already tried 9 time(s).
>>> 2011-12-01 00:04:55,261 WARN org.apache.giraph.comm.BasicRPCCommunications: connectAllRPCProxys:
    Failed on attempt 0 of 5 to connect to (id=0,cur=Worker(hostname=rainbow-01, MRpartition=4,
port=30004),prev=null,ckpt_file=null)
>>> java.net.ConnectException: Call to rainbow-01/192.168.100.1:30004 failed on connection
exception: java.net.ConnectException: Connection refused
>>> 
>>> Are you sure that your machines can communicate to each other?  Are the ports
30000 and up blocked?  And you're right, you should have only had 6 tasks.  What version of
Hadoop is this on?
>>> 
>>> Avery
>>> 
>>> On 12/1/11 2:43 PM, Inci Cetindil wrote:
>>>> Hi Avery,
>>>> 
>>>> I attached the logs for the first attemps. The weird thing is even if I specified
the number of workers as 5, I had 8 mapper tasks. You can see the logs for tasks 6 and 7 failed
immediately. Do you have any explanation for this behavior? Normally I should have 6 tasks,
right?
>>>> 
>>>> Thanks,
>>>> Inci
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Dec 1, 2011, at 11:00 AM, Avery Ching wrote:
>>>> 
>>>>> Hi Inci,
>>>>> 
>>>>> I am not sure what's wrong.  I ran the exact same command on a freshly
checked version of Graph without any trouble.  Here's my output:
>>>>> 
>>>>> hadoop jar target/giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark
-e 1 -s 3 -v -V 500 -w 5
>>>>> Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex
>>>>> 11/12/01 10:58:05 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter
will not check anything
>>>>> 11/12/01 10:58:05 INFO mapred.JobClient: Running job: job_201112011054_0003
>>>>> 11/12/01 10:58:06 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 11/12/01 10:58:23 INFO mapred.JobClient:  map 16% reduce 0%
>>>>> 11/12/01 10:58:35 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Job complete: job_201112011054_0003
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Counters: 31
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Job Counters
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=77566
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all
maps waiting after reserving slots (ms)=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Launched map tasks=6
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Timers
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total (milliseconds)=13468
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 3 (milliseconds)=41
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Setup (milliseconds)=11691
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Shutdown (milliseconds)=73
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=369
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 0 (milliseconds)=674
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 2 (milliseconds)=519
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 1 (milliseconds)=96
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Stats
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate edges=500
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep=4
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Last checkpointed superstep=2
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Current workers=5
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Current master task partition=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Sent messages=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate finished vertices=500
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate vertices=500
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   File Output Format Counters
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Written=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   FileSystemCounters
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_READ=590
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_READ=264
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=129240
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=55080
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   File Input Format Counters
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Read=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Map input records=6
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Spilled Records=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Map output records=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=264
>>>>> 
>>>>> 
>>>>> Would it be possible to send me the logs from the first attempts for
every map task?
>>>>> 
>>>>> i.e. from
>>>>> Task attempt_201111302343_0002_m_000000_0
>>>>> Task attempt_201111302343_0002_m_000001_0
>>>>> Task attempt_201111302343_0002_m_000002_0
>>>>> Task attempt_201111302343_0002_m_000003_0
>>>>> Task attempt_201111302343_0002_m_000004_0
>>>>> Task attempt_201111302343_0002_m_000005_0
>>>>> 
>>>>> I think that could help us find the issue.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Avery
>>>>> 
>>>>> On 12/1/11 1:17 AM, Inci Cetindil wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I'm running PageRank benchmark example on a cluster with 1 master
+ 5 slave nodes. I have tried it with a large number of vertices; when I failed I decided
to make it run with 500 vertices and 5 workers first.  However, it doesn't work even for 500
vertices.
>>>>>> I am using the latest version of Giraph from the trunk and running
the following command:
>>>>>> 
>>>>>> hadoop jar giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark
-e 1 -s 3 -v -V 500 -w 5
>>>>>> 
>>>>>> I attached the error message that I am receiving. Please let me know
if I am missing something.
>>>>>> 
>>>>>> Best regards,
>>>>>> Inci
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
> 


Mime
View raw message