Return-Path: X-Original-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7DB417747 for ; Thu, 1 Dec 2011 23:57:47 +0000 (UTC) Received: (qmail 36363 invoked by uid 500); 1 Dec 2011 23:57:47 -0000 Delivered-To: apmail-incubator-giraph-user-archive@incubator.apache.org Received: (qmail 36327 invoked by uid 500); 1 Dec 2011 23:57:47 -0000 Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-user@incubator.apache.org Delivered-To: mailing list giraph-user@incubator.apache.org Received: (qmail 36319 invoked by uid 99); 1 Dec 2011 23:57:47 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 23:57:47 +0000 Received: from localhost (HELO achingmbp15.local) (127.0.0.1) (smtp-auth username aching, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 23:57:47 +0000 Message-ID: <4ED8147A.3040404@apache.org> Date: Thu, 01 Dec 2011 15:57:46 -0800 From: Avery Ching User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: giraph-user@incubator.apache.org Subject: Re: PageRankBenchmrk fails due to IllegalStateException References: <3B92DFF2-DDAD-4DC2-84D1-222E8A791658@gmail.com> <4ED7CEC9.3050602@apache.org> In-Reply-To: Content-Type: multipart/alternative; boundary="------------080401080604020304090603" This is a multi-part message in MIME format. --------------080401080604020304090603 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Thanks for the logs. I see a lot of issues like the following: 2011-12-01 00:04:46,241 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 0 time(s). 2011-12-01 00:04:47,243 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 1 time(s). 2011-12-01 00:04:48,245 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 2 time(s). 2011-12-01 00:04:49,247 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 3 time(s). 2011-12-01 00:04:50,249 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 4 time(s). 2011-12-01 00:04:51,251 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 5 time(s). 2011-12-01 00:04:52,253 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 6 time(s). 2011-12-01 00:04:53,255 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 7 time(s). 2011-12-01 00:04:54,256 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 8 time(s). 2011-12-01 00:04:55,258 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 9 time(s). 2011-12-01 00:04:55,261 WARN org.apache.giraph.comm.BasicRPCCommunications: connectAllRPCProxys: Failed on attempt 0 of 5 to connect to (id=0,cur=Worker(hostname=rainbow-01, MRpartition=4, port=30004),prev=null,ckpt_file=null) java.net.ConnectException: Call to rainbow-01/192.168.100.1:30004 failed on connection exception: java.net.ConnectException: Connection refused Are you sure that your machines can communicate to each other? Are the ports 30000 and up blocked? And you're right, you should have only had 6 tasks. What version of Hadoop is this on? Avery On 12/1/11 2:43 PM, Inci Cetindil wrote: > Hi Avery, > > I attached the logs for the first attemps. The weird thing is even if I specified the number of workers as 5, I had 8 mapper tasks. You can see the logs for tasks 6 and 7 failed immediately. Do you have any explanation for this behavior? Normally I should have 6 tasks, right? > > Thanks, > Inci > > > > > On Dec 1, 2011, at 11:00 AM, Avery Ching wrote: > >> Hi Inci, >> >> I am not sure what's wrong. I ran the exact same command on a freshly checked version of Graph without any trouble. Here's my output: >> >> hadoop jar target/giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5 >> Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex >> 11/12/01 10:58:05 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything >> 11/12/01 10:58:05 INFO mapred.JobClient: Running job: job_201112011054_0003 >> 11/12/01 10:58:06 INFO mapred.JobClient: map 0% reduce 0% >> 11/12/01 10:58:23 INFO mapred.JobClient: map 16% reduce 0% >> 11/12/01 10:58:35 INFO mapred.JobClient: map 100% reduce 0% >> 11/12/01 10:58:40 INFO mapred.JobClient: Job complete: job_201112011054_0003 >> 11/12/01 10:58:40 INFO mapred.JobClient: Counters: 31 >> 11/12/01 10:58:40 INFO mapred.JobClient: Job Counters >> 11/12/01 10:58:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=77566 >> 11/12/01 10:58:40 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 >> 11/12/01 10:58:40 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 >> 11/12/01 10:58:40 INFO mapred.JobClient: Launched map tasks=6 >> 11/12/01 10:58:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >> 11/12/01 10:58:40 INFO mapred.JobClient: Giraph Timers >> 11/12/01 10:58:40 INFO mapred.JobClient: Total (milliseconds)=13468 >> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 3 (milliseconds)=41 >> 11/12/01 10:58:40 INFO mapred.JobClient: Setup (milliseconds)=11691 >> 11/12/01 10:58:40 INFO mapred.JobClient: Shutdown (milliseconds)=73 >> 11/12/01 10:58:40 INFO mapred.JobClient: Vertex input superstep (milliseconds)=369 >> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 0 (milliseconds)=674 >> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 2 (milliseconds)=519 >> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 1 (milliseconds)=96 >> 11/12/01 10:58:40 INFO mapred.JobClient: Giraph Stats >> 11/12/01 10:58:40 INFO mapred.JobClient: Aggregate edges=500 >> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep=4 >> 11/12/01 10:58:40 INFO mapred.JobClient: Last checkpointed superstep=2 >> 11/12/01 10:58:40 INFO mapred.JobClient: Current workers=5 >> 11/12/01 10:58:40 INFO mapred.JobClient: Current master task partition=0 >> 11/12/01 10:58:40 INFO mapred.JobClient: Sent messages=0 >> 11/12/01 10:58:40 INFO mapred.JobClient: Aggregate finished vertices=500 >> 11/12/01 10:58:40 INFO mapred.JobClient: Aggregate vertices=500 >> 11/12/01 10:58:40 INFO mapred.JobClient: File Output Format Counters >> 11/12/01 10:58:40 INFO mapred.JobClient: Bytes Written=0 >> 11/12/01 10:58:40 INFO mapred.JobClient: FileSystemCounters >> 11/12/01 10:58:40 INFO mapred.JobClient: FILE_BYTES_READ=590 >> 11/12/01 10:58:40 INFO mapred.JobClient: HDFS_BYTES_READ=264 >> 11/12/01 10:58:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=129240 >> 11/12/01 10:58:40 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=55080 >> 11/12/01 10:58:40 INFO mapred.JobClient: File Input Format Counters >> 11/12/01 10:58:40 INFO mapred.JobClient: Bytes Read=0 >> 11/12/01 10:58:40 INFO mapred.JobClient: Map-Reduce Framework >> 11/12/01 10:58:40 INFO mapred.JobClient: Map input records=6 >> 11/12/01 10:58:40 INFO mapred.JobClient: Spilled Records=0 >> 11/12/01 10:58:40 INFO mapred.JobClient: Map output records=0 >> 11/12/01 10:58:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=264 >> >> >> Would it be possible to send me the logs from the first attempts for every map task? >> >> i.e. from >> Task attempt_201111302343_0002_m_000000_0 >> Task attempt_201111302343_0002_m_000001_0 >> Task attempt_201111302343_0002_m_000002_0 >> Task attempt_201111302343_0002_m_000003_0 >> Task attempt_201111302343_0002_m_000004_0 >> Task attempt_201111302343_0002_m_000005_0 >> >> I think that could help us find the issue. >> >> Thanks, >> >> Avery >> >> On 12/1/11 1:17 AM, Inci Cetindil wrote: >>> Hi, >>> >>> I'm running PageRank benchmark example on a cluster with 1 master + 5 slave nodes. I have tried it with a large number of vertices; when I failed I decided to make it run with 500 vertices and 5 workers first. However, it doesn't work even for 500 vertices. >>> I am using the latest version of Giraph from the trunk and running the following command: >>> >>> hadoop jar giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5 >>> >>> I attached the error message that I am receiving. Please let me know if I am missing something. >>> >>> Best regards, >>> Inci >>> >>> >>> >>> >>> --------------080401080604020304090603 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thanks for the logs.  I see a lot of issues like the following:

2011-12-01 00:04:46,241 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 0 time(s).
2011-12-01 00:04:47,243 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 1 time(s).
2011-12-01 00:04:48,245 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 2 time(s).
2011-12-01 00:04:49,247 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 3 time(s).
2011-12-01 00:04:50,249 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 4 time(s).
2011-12-01 00:04:51,251 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 5 time(s).
2011-12-01 00:04:52,253 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 6 time(s).
2011-12-01 00:04:53,255 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 7 time(s).
2011-12-01 00:04:54,256 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 8 time(s).
2011-12-01 00:04:55,258 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: rainbow-01/192.168.100.1:30004. Already tried 9 time(s).
2011-12-01 00:04:55,261 WARN org.apache.giraph.comm.BasicRPCCommunications: connectAllRPCProxys: Failed on attempt 0 of 5 to connect to (id=0,cur=Worker(hostname=rainbow-01, MRpartition=4, port=30004),prev=null,ckpt_file=null)
java.net.ConnectException: Call to rainbow-01/192.168.100.1:30004 failed on connection exception: java.net.ConnectException: Connection refused

Are you sure that your machines can communicate to each other?  Are the ports 30000 and up blocked?  And you're right, you should have only had 6 tasks.  What version of Hadoop is this on?

Avery

On 12/1/11 2:43 PM, Inci Cetindil wrote:
Hi Avery,

I attached the logs for the first attemps. The weird thing is even if I specified the number of workers as 5, I had 8 mapper tasks. You can see the logs for tasks 6 and 7 failed immediately. Do you have any explanation for this behavior? Normally I should have 6 tasks, right?

Thanks,
Inci




On Dec 1, 2011, at 11:00 AM, Avery Ching wrote:

Hi Inci,

I am not sure what's wrong.  I ran the exact same command on a freshly checked version of Graph without any trouble.  Here's my output:

hadoop jar target/giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5
Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex
11/12/01 10:58:05 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
11/12/01 10:58:05 INFO mapred.JobClient: Running job: job_201112011054_0003
11/12/01 10:58:06 INFO mapred.JobClient:  map 0% reduce 0%
11/12/01 10:58:23 INFO mapred.JobClient:  map 16% reduce 0%
11/12/01 10:58:35 INFO mapred.JobClient:  map 100% reduce 0%
11/12/01 10:58:40 INFO mapred.JobClient: Job complete: job_201112011054_0003
11/12/01 10:58:40 INFO mapred.JobClient: Counters: 31
11/12/01 10:58:40 INFO mapred.JobClient:   Job Counters 
11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=77566
11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
11/12/01 10:58:40 INFO mapred.JobClient:     Launched map tasks=6
11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Timers
11/12/01 10:58:40 INFO mapred.JobClient:     Total (milliseconds)=13468
11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 3 (milliseconds)=41
11/12/01 10:58:40 INFO mapred.JobClient:     Setup (milliseconds)=11691
11/12/01 10:58:40 INFO mapred.JobClient:     Shutdown (milliseconds)=73
11/12/01 10:58:40 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=369
11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 0 (milliseconds)=674
11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 2 (milliseconds)=519
11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 1 (milliseconds)=96
11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Stats
11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate edges=500
11/12/01 10:58:40 INFO mapred.JobClient:     Superstep=4
11/12/01 10:58:40 INFO mapred.JobClient:     Last checkpointed superstep=2
11/12/01 10:58:40 INFO mapred.JobClient:     Current workers=5
11/12/01 10:58:40 INFO mapred.JobClient:     Current master task partition=0
11/12/01 10:58:40 INFO mapred.JobClient:     Sent messages=0
11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate finished vertices=500
11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate vertices=500
11/12/01 10:58:40 INFO mapred.JobClient:   File Output Format Counters 
11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Written=0
11/12/01 10:58:40 INFO mapred.JobClient:   FileSystemCounters
11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_READ=590
11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_READ=264
11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=129240
11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=55080
11/12/01 10:58:40 INFO mapred.JobClient:   File Input Format Counters 
11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Read=0
11/12/01 10:58:40 INFO mapred.JobClient:   Map-Reduce Framework
11/12/01 10:58:40 INFO mapred.JobClient:     Map input records=6
11/12/01 10:58:40 INFO mapred.JobClient:     Spilled Records=0
11/12/01 10:58:40 INFO mapred.JobClient:     Map output records=0
11/12/01 10:58:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=264


Would it be possible to send me the logs from the first attempts for every map task?

i.e. from 
Task attempt_201111302343_0002_m_000000_0
Task attempt_201111302343_0002_m_000001_0
Task attempt_201111302343_0002_m_000002_0
Task attempt_201111302343_0002_m_000003_0
Task attempt_201111302343_0002_m_000004_0
Task attempt_201111302343_0002_m_000005_0

I think that could help us find the issue.

Thanks,

Avery

On 12/1/11 1:17 AM, Inci Cetindil wrote:
Hi,

I'm running PageRank benchmark example on a cluster with 1 master + 5 slave nodes. I have tried it with a large number of vertices; when I failed I decided to make it run with 500 vertices and 5 workers first.  However, it doesn't work even for 500 vertices. 
I am using the latest version of Giraph from the trunk and running the following command:

hadoop jar giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5

I attached the error message that I am receiving. Please let me know if I am missing something.

Best regards,
Inci






      

    

--------------080401080604020304090603--