incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: PageRankBenchmrk fails due to IllegalStateException
Date Fri, 02 Dec 2011 03:03:26 GMT
Hmmm...this is unusual.  I wonder if it is tired to the weird number of 
tasks you are getting.  Can you try it with various numbers of workers 
(i.e. 1, 2) and see if it works?

To me, the connection refused error indicates that perhaps the server 
failed to bind to its port (are you running multiple Giraph jobs at the 
same time) or the server died?

Avery

On 12/1/11 5:33 PM, Inci Cetindil wrote:
> I am sure the machines can communicate to each other and the ports are not blocked. I
can run word count hadoop job without any problem on these machines. My hadoop version is
0.20.203.0.
>
> Thanks,
> Inci
>
> On Dec 1, 2011, at 3:57 PM, Avery Ching wrote:
>
>> Thanks for the logs.  I see a lot of issues like the following:
>>
>> 2011-12-01 00:04:46,241 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 0 time(s).
>> 2011-12-01 00:04:47,243 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 1 time(s).
>> 2011-12-01 00:04:48,245 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 2 time(s).
>> 2011-12-01 00:04:49,247 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 3 time(s).
>> 2011-12-01 00:04:50,249 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 4 time(s).
>> 2011-12-01 00:04:51,251 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 5 time(s).
>> 2011-12-01 00:04:52,253 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 6 time(s).
>> 2011-12-01 00:04:53,255 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 7 time(s).
>> 2011-12-01 00:04:54,256 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 8 time(s).
>> 2011-12-01 00:04:55,258 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
rainbow-01/192.168.100.1:30004. Already tried 9 time(s).
>> 2011-12-01 00:04:55,261 WARN org.apache.giraph.comm.BasicRPCCommunications: connectAllRPCProxys:
    Failed on attempt 0 of 5 to connect to (id=0,cur=Worker(hostname=rainbow-01, MRpartition=4,
port=30004),prev=null,ckpt_file=null)
>> java.net.ConnectException: Call to rainbow-01/192.168.100.1:30004 failed on connection
exception: java.net.ConnectException: Connection refused
>>
>> Are you sure that your machines can communicate to each other?  Are the ports 30000
and up blocked?  And you're right, you should have only had 6 tasks.  What version of Hadoop
is this on?
>>
>> Avery
>>
>> On 12/1/11 2:43 PM, Inci Cetindil wrote:
>>> Hi Avery,
>>>
>>> I attached the logs for the first attemps. The weird thing is even if I specified
the number of workers as 5, I had 8 mapper tasks. You can see the logs for tasks 6 and 7 failed
immediately. Do you have any explanation for this behavior? Normally I should have 6 tasks,
right?
>>>
>>> Thanks,
>>> Inci
>>>
>>>
>>>
>>>
>>> On Dec 1, 2011, at 11:00 AM, Avery Ching wrote:
>>>
>>>> Hi Inci,
>>>>
>>>> I am not sure what's wrong.  I ran the exact same command on a freshly checked
version of Graph without any trouble.  Here's my output:
>>>>
>>>> hadoop jar target/giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark
-e 1 -s 3 -v -V 500 -w 5
>>>> Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex
>>>> 11/12/01 10:58:05 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter
will not check anything
>>>> 11/12/01 10:58:05 INFO mapred.JobClient: Running job: job_201112011054_0003
>>>> 11/12/01 10:58:06 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 11/12/01 10:58:23 INFO mapred.JobClient:  map 16% reduce 0%
>>>> 11/12/01 10:58:35 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Job complete: job_201112011054_0003
>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Counters: 31
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Job Counters
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=77566
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Launched map tasks=6
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Timers
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total (milliseconds)=13468
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 3 (milliseconds)=41
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Setup (milliseconds)=11691
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Shutdown (milliseconds)=73
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=369
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 0 (milliseconds)=674
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 2 (milliseconds)=519
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 1 (milliseconds)=96
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Stats
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate edges=500
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep=4
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Last checkpointed superstep=2
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Current workers=5
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Current master task partition=0
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Sent messages=0
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate finished vertices=500
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate vertices=500
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   File Output Format Counters
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Written=0
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   FileSystemCounters
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_READ=590
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_READ=264
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=129240
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=55080
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   File Input Format Counters
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Read=0
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Map input records=6
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Spilled Records=0
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Map output records=0
>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=264
>>>>
>>>>
>>>> Would it be possible to send me the logs from the first attempts for every
map task?
>>>>
>>>> i.e. from
>>>> Task attempt_201111302343_0002_m_000000_0
>>>> Task attempt_201111302343_0002_m_000001_0
>>>> Task attempt_201111302343_0002_m_000002_0
>>>> Task attempt_201111302343_0002_m_000003_0
>>>> Task attempt_201111302343_0002_m_000004_0
>>>> Task attempt_201111302343_0002_m_000005_0
>>>>
>>>> I think that could help us find the issue.
>>>>
>>>> Thanks,
>>>>
>>>> Avery
>>>>
>>>> On 12/1/11 1:17 AM, Inci Cetindil wrote:
>>>>> Hi,
>>>>>
>>>>> I'm running PageRank benchmark example on a cluster with 1 master + 5
slave nodes. I have tried it with a large number of vertices; when I failed I decided to make
it run with 500 vertices and 5 workers first.  However, it doesn't work even for 500 vertices.
>>>>> I am using the latest version of Giraph from the trunk and running the
following command:
>>>>>
>>>>> hadoop jar giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark
-e 1 -s 3 -v -V 500 -w 5
>>>>>
>>>>> I attached the error message that I am receiving. Please let me know
if I am missing something.
>>>>>
>>>>> Best regards,
>>>>> Inci
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>


Mime
View raw message