Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66C8210385 for ; Mon, 31 Mar 2014 17:21:12 +0000 (UTC) Received: (qmail 36997 invoked by uid 500); 31 Mar 2014 17:21:10 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 36860 invoked by uid 500); 31 Mar 2014 17:21:08 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 36850 invoked by uid 99); 31 Mar 2014 17:21:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 17:21:06 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_MED,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of young.han@uwaterloo.ca designates 129.97.128.141 as permitted sender) Received: from [129.97.128.141] (HELO mailchk-m06.uwaterloo.ca) (129.97.128.141) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 17:20:59 +0000 Received: from mail-qc0-f174.google.com (mail-qc0-f174.google.com [209.85.216.174]) (authenticated bits=0) by mailchk-m06.uwaterloo.ca (8.14.4/8.14.4) with ESMTP id s2VHKWdg029504 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for ; Mon, 31 Mar 2014 13:20:33 -0400 Received: by mail-qc0-f174.google.com with SMTP id c9so9368444qcz.33 for ; Mon, 31 Mar 2014 10:20:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+ao3l0TxLHjTFseNt+dPzSNNkp6/l2USaQcNTMiGuy0=; b=YO1hko0wNaR4Zfe4sBNU/1YjKYAkUH+QbNBuEHmXwxCDaAY9mw6SLouL5fW8iScwCe In/N5IEMirK9tfhpR5I7UhIMIygRoliAy9RsKk6icY9kMgefVGfFdCqSKxnClLUOxNYW YXrI9WmFUqLDVF6ogR9JsSFvlxE2hmZUxcEES/TgAg6MNLMdXZkIZ5e23xMbHxPrKpnO UVi42Kh/9AD2GN6YmmcmeWk6To/8hxEv8nuFB34MEMHIbnS5N3BTxnTqm7eDn/0H93Nn AAF4whZ0Zm70JZLDzUiF9Z5HcQSYH8Lf87k1oMWxqfOJ7GjtLjP93a4lXDY4jE9EXQxO Q4uA== MIME-Version: 1.0 X-Received: by 10.224.104.1 with SMTP id m1mr10297454qao.51.1396286432754; Mon, 31 Mar 2014 10:20:32 -0700 (PDT) Received: by 10.224.29.3 with HTTP; Mon, 31 Mar 2014 10:20:32 -0700 (PDT) In-Reply-To: References: Date: Mon, 31 Mar 2014 13:20:32 -0400 Message-ID: Subject: Re: ConnectedComponents example From: Young Han To: ghufran malik Cc: user@giraph.apache.org Content-Type: multipart/alternative; boundary=001a11c248e0d443a604f5ea446e X-UUID: d82f086d-f0fd-4fd8-8418-a7a747226881 X-Miltered: at mailchk-m06 with ID 5339A3E0.002 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Virus-Scanned: clamav-milter 0.98.1 at mailchk-m06 X-Virus-Status: Clean X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mailchk-m06.uwaterloo.ca [129.97.128.141]); Mon, 31 Mar 2014 13:20:34 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mailchk-r01.uwaterloo.ca X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-7.0 required=5.0 tests=ALL_TRUSTED,AWL,HTML_MESSAGE, NORMAL_HTTP_TO_IP,WEIRD_PORT autolearn=disabled version=3.3.1 --001a11c248e0d443a604f5ea446e Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hmm.. it looks like a failure during graph loading. Did you forget a .txt in the input path? Young On Mon, Mar 31, 2014 at 1:17 PM, ghufran malik wro= te: > Hi, > > Thanks for the speedy response! > > It didn't work for me :(. > > I updated the ConnectComponentsVertex class with yours and added in the > new ConnectedComponentsInputFormat class. They are both in the > giraph-examples/src/main/java/org/apache/giraph/examples package. > To compile the example package: > I cd'd to ~/Downloads/giraph-folder/giraph-1.0.0/giraph-examples > and then typed "mvn compile" which resulted in BUILD SUCCESS. As a sanity > check I checked the jar to make sure it had the > ConnectedComponentsInputFormat class in it, and it did. > > I then updated my graph by taking out the vertex values so in the end I > had: > > > 1 2 > 2 1 3 4 > 3 2 > 4 2 > > where the numbers are separated out by tab space ([\t]). > > The command I ran was: > > hadoop jar > /home/ghufran/Downloads/giraph-folder/giraph-1.0.0/giraph-examples/target= /giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner > org.apache.giraph.examples.ConnectedComponentsVertex -vif > org.apache.giraph.examples.ConnectedComponentsInputFormat -vip > /user/ghufran/input/my_graph -of > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op > /user/ghufran/giraph-output -w 1 > > > but I ended up with the output: > > 14/03/31 17:43:49 INFO utils.ConfigurationUtils: No edge input format > specified. Ensure your InputFormat does not require one. > 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format > vertex index type is not known > 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format > vertex value type is not known > 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format > edge value type is not known > 14/03/31 17:43:49 INFO job.GiraphJob: run: Since checkpointing is disable= d > (default), do not allow any task retries (setting mapred.map.max.attempts= =3D > 0, old value =3D 4) > 14/03/31 17:43:50 INFO mapred.JobClient: Running job: job_201403311622_00= 02 > 14/03/31 17:43:51 INFO mapred.JobClient: map 0% reduce 0% > 14/03/31 17:44:08 INFO mapred.JobClient: map 50% reduce 0% > 14/03/31 17:54:54 INFO mapred.JobClient: map 0% reduce 0% > 14/03/31 17:54:59 INFO mapred.JobClient: Job complete: > job_201403311622_0002 > 14/03/31 17:54:59 INFO mapred.JobClient: Counters: 6 > 14/03/31 17:54:59 INFO mapred.JobClient: Job Counters > 14/03/31 17:54:59 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3D656429 > 14/03/31 17:54:59 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=3D0 > 14/03/31 17:54:59 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=3D0 > 14/03/31 17:54:59 INFO mapred.JobClient: Launched map tasks=3D2 > 14/03/31 17:54:59 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3D0 > 14/03/31 17:54:59 INFO mapred.JobClient: Failed map tasks=3D1 > > Any ideas to why this happened? Do you think I need to update the hadoop = I > am using? > > Kind regards, > > Ghufran > > > On Mon, Mar 31, 2014 at 5:11 PM, Young Han wrote= : > >> Hey, >> >> Sure, I've uploaded the 1.0.0 classes I'm using: >> http://pastebin.com/0cTdWrR4 >> http://pastebin.com/jWgVAzH6 >> >> They both go into giraph-examples/src/main/java/org/apache/giraph/exampl= es >> >> Note that the input format it accepts is of the form "src dst1 dst2 dst3 >> ..."---there is no vertex value. So your test graph would be: >> >> 1 2 >> 2 1 3 4 >> 3 2 >> 4 2 >> >> The command I'm using is: >> >> hadoop jar >> "$GIRAPH_DIR"/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.= 0.2-jar-with-dependencies.jar >> org.apache.giraph.GiraphRunner \ >> org.apache.giraph.examples.ConnectedComponentsVertex \ >> -vif org.apache.giraph.examples.ConnectedComponentsInputFormat \ >> -vip /user/${USER}/input/${inputgraph} \ >> -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ >> -op /user/${USER}/giraph-output/ \ >> -w 1 >> >> You'll want to change $GIRAPH_DIR, ${inputgraph}, and also the JAR file >> name since you're using Hadoop 0.20.203. >> >> Young >> >> >> On Mon, Mar 31, 2014 at 12:00 PM, ghufran malik wrote: >> >>> Hi Young, >>> >>> I'd just like to say first thank you for your help it's much appreciate= d! >>> >>> I did the sanity check and everything seems fine I see the correct >>> results. >>> >>> Yes I hadn't noticed that before that is strange, I don't know how that >>> happened as on the quick start guide ( >>> https://giraph.apache.org/quick_start.html#qs_section_2) it says hadoop >>> 0.20.203 was the assumed default. I have both Giraph 1.1.0 and Giraph 1= .0.0 >>> and my Giraph 1.0.0 is compiled to 0.20.203. >>> >>> I edited the code as you said for Giraph 1.1.0 but still received the >>> same error as before, so I thought it may be due to the hadoop version = it >>> was compiled for. So I decided to try modify the code in Giraph 1.0.0 >>> instead, however since I do not have the correct input format class and= the >>> vertex object is not instantiated in the ConnectedComponents class of >>> Giraph 1.0.0, I was wondering if you could send me the full classes for >>> both the ConnectedComponents class and the InputFormat so that I know c= ode >>> wise everything should be correct. >>> >>> I will be trying to implement the InputFormat class and >>> ConnectedComponents in the meantime and if I get it working before you >>> respond I'll update this post. >>> >>> Thanks >>> >>> Ghufran. >>> >>> >>> On Sun, Mar 30, 2014 at 5:41 PM, Young Han wrot= e: >>> >>>> Hey, >>>> >>>> As a sanity check, is the graph really loaded on HDFS? Do you see the >>>> correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.t= xt"? >>>> (Where hadoop is your hadoop binary). >>>> >>>> Also, I noticed that your Giraph has been compiled for Hadoop 1.x, >>>> while the logs show Hadoop 0.20.203.0. Maybe that could be the cause t= oo? >>>> >>>> Finally, this may be completely irrelevant, but I had issues running >>>> connected components on Giraph 1.0.0 and I fixed it by changing the >>>> algorithm and the input format. The input format you're using on 1.1.0 >>>> looks correct to me. The algorithm change I did was to the first "if" = block >>>> in ConnectedComponentsComputation: >>>> >>>> if (getSuperstep() =3D=3D 0) { currentComponent =3D vertex.ge= tId().get(); vertex.setValue(new IntWritable(currentComponent)); = sendMessageToAllEdges(vertex, vertex.getValue()); vertex.voteToHalt();= return; } >>>> >>>> I forget what error this change solved, so it may not help in your cas= e. >>>> >>>> Young >>>> >>>> >>>> >>>> On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik >>> > wrote: >>>> >>>>> Hello, >>>>> >>>>> I am a final year Bsc Computer Science Student who is using Apache >>>>> Giraph for my final year project and dissertation and would appreciat= e very >>>>> much if someone could help me with the following issue. >>>>> >>>>> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am >>>>> having trouble running the ConnectedComponents example. I use the fol= lowing >>>>> command: >>>>> >>>>> hadoop jar >>>>> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-= examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar >>>>> org.apache.giraph.GiraphRunner >>>>> org.apache.giraph.examples.ConnectedComponentsComputation -vif >>>>> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip >>>>> /user/ghufran/in/my_graph.txt -vof >>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >>>>> /user/ghufran/outCC -w 1 >>>>> >>>>> >>>>> I believe it gets stuck in the InputSuperstep as the following is >>>>> displayed in terminal when the command is running: >>>>> >>>>> 14/03/30 10:48:46 INFO mapred.JobClient: map 100% reduce 0% >>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edge= s >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109= .01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edge= s >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109= .01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edge= s >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108= .78MB, >>>>> average 108.78MB >>>>> .... >>>>> >>>>> which I traced back to the following if statement in the toString() >>>>> method of core.org.apache.job.CombinedWorkerProgress: >>>>> >>>>> if (isInputSuperstep()) { >>>>> sb.append("Loading data: "); >>>>> sb.append(verticesLoaded).append(" vertices loaded, "); >>>>> sb.append(vertexInputSplitsLoaded).append( >>>>> " vertex input splits loaded; "); >>>>> sb.append(edgesLoaded).append(" edges loaded, "); >>>>> sb.append(edgeInputSplitsLoaded).append(" edge input splits >>>>> loaded"); >>>>> >>>>> sb.append("; min free memory on worker ").append( >>>>> workerWithMinFreeMemory).append(" - ").append( >>>>> DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average >>>>> ").append( >>>>> DECIMAL_FORMAT.format(freeMemoryMB)).append("MB"); >>>>> >>>>> So it seems to me that it's not loading in the InputFormat correctly. >>>>> So I am assuming there's something wrong with my input format class o= r, >>>>> probably more likely, something wrong with the graph I passed in? >>>>> >>>>> I pass in a small graph that has the format vertex id, vertex value, >>>>> neighbours separated by tabs, my graph is shown below: >>>>> >>>>> 1 0 2 >>>>> 2 1 1 3 4 >>>>> 3 2 2 >>>>> 4 3 2 >>>>> >>>>> The full output is shown below after I ran my command is shown below. >>>>> If anyone could explain to me why I am not getting the expected outpu= t I >>>>> would greatly appreciate it. >>>>> >>>>> Many thanks, >>>>> >>>>> Ghufran >>>>> >>>>> >>>>> FULL OUTPUT: >>>>> >>>>> >>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format >>>>> specified. Ensure your InputFormat does not require one. >>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output forma= t >>>>> specified. Ensure your OutputFormat does not require one. >>>>> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is >>>>> disabled (default), do not allow any task retries (setting >>>>> mapred.map.max.attempts =3D 0, old value =3D 4) >>>>> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL: >>>>> http://ghufran:50030/jobdetails.jsp?jobid=3Djob_201403301044_0001 >>>>> 14/03/30 10:48:45 INFO >>>>> job.HaltApplicationUtils$DefaultHaltInstructionsWriter: >>>>> writeHaltInstructions: To halt after next superstep execute: >>>>> 'bin/halt-application --zkServer ghufran:22181 --zkNode >>>>> /_hadoopBsp/job_201403301044_0001/_haltComputation' >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:zookeeper.version=3D3.4.5-1392090, built on 09/30/2012 17= :52 GMT >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment: >>>>> host.name=3Dghufran >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.version=3D1.7.0_51 >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.vendor=3DOracle Corporation >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.home=3D/usr/lib/jvm/java-7-oracle/jre >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.class.path=3D/home/ghufran/Downloads/hadoop-0.20.203= .0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downl= oads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin= /../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bi= n/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/= ../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin= /../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203= .0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hado= op-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop= -0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop= -0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Download= s/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/= Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufr= an/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/gh= ufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufr= an/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/hom= e/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home= /ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/= home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0= .4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.= 1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.= 1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/= home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/ho= me/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.ja= r:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1= .0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compil= er-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-r= untime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets= 3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1= .26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1= .26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.ja= r:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/= ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/= Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downl= oads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downl= oads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/had= oop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downlo= ads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Download= s/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downlo= ads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/ha= doop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hado= op-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.library.path=3D/home/ghufran/Downloads/hadoop-0.20.2= 03.0/bin/../lib/native/Linux-amd64-64 >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.io.tmpdir=3D/tmp >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.compiler=3D >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.nam= e >>>>> =3DLinux >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:os.arch=3Damd64 >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:os.version=3D3.8.0-35-generic >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment: >>>>> user.name=3Dghufran >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:user.home=3D/home/ghufran >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:user.dir=3D/home/ghufran/Downloads/hadoop-0.20.203.0/bin >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client >>>>> connection, connectString=3Dghufran:22181 sessionTimeout=3D60000 >>>>> watcher=3Dorg.apache.giraph.job.JobProgressTracker@209fa588 >>>>> 14/03/30 10:48:45 INFO mapred.JobClient: Running job: >>>>> job_201403301044_0001 >>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connectio= n >>>>> to server ghufran/127.0.1.1:22181. Will not attempt to authenticate >>>>> using SASL (unknown error) >>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection >>>>> established to ghufran/127.0.1.1:22181, initiating session >>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment >>>>> complete on server ghufran/127.0.1.1:22181, sessionid =3D >>>>> 0x1451263c44c0002, negotiated timeout =3D 600000 >>>>> 14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edge= s >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109= .01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:48:46 INFO mapred.JobClient: map 100% reduce 0% >>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edge= s >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109= .01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edge= s >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109= .01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edge= s >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108= .78MB, >>>>> average 108.78MB >>>>> >>>>> >>>> >>> >> > --001a11c248e0d443a604f5ea446e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hmm.. it looks like a failure during graph loading. D= id you forget a .txt in the input path?

Young


On Mon, Mar 31, 2014= at 1:17 PM, ghufran malik <ghufran1malik@gmail.com> w= rote:
Hi,=A0

Thanks for t= he speedy response!=A0

It didn't work for me := (.

I updated the ConnectComponentsVertex class with yours = and added in the new ConnectedComponentsInputFormat class. They are both in= the=A0giraph-e= xamples/src/main/java/org/apache/giraph/examples package.=A0
To compile the example package:
I = cd'd to=A0~/Downloads/giraph-folder/giraph-1.0.0/giraph-examples=A0
and then typed "mvn com= pile" which resulted in BUILD SUCCESS. As a sanity check I checked the= jar to make sure it had the ConnectedComponentsInputFormat class in it, an= d it did.=A0

I then updated my graph by= taking out the vertex values so in the end I had:=A0

1=A02
2=A01=A03=A04=A0
3= =A02
4=A0= 2

where the numbers are = separated out by tab space ([\t]).=A0

The command I ran was:=A0

hadoop jar /home/ghufran= /Downloads/giraph-folder/giraph-1.0.0/giraph-examples/target/giraph-example= s-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.G= iraphRunner org.apache.giraph.examples.ConnectedComponentsVertex -vif org.a= pache.giraph.examples.ConnectedComponentsInputFormat -vip /user/ghufran/inp= ut/my_graph -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -o= p /user/ghufran/giraph-output -w 1

=

<= font face=3D"arial, sans-serif">but I ended up with the output:=A0
14/03/31 17:43:49 INFO utils.Configur= ationUtils: No edge input format specified. Ensure your InputFormat does no= t require one.
14/03/31 1= 7:43:49 WARN job.GiraphConfigurationValidator: Output format vertex index t= ype is not known
14/03/31 17:43:49 WARN job.GiraphConf= igurationValidator: Output format vertex value type is not known
14/03/31 17:43:49 WARN job.GiraphCo= nfigurationValidator: Output format edge value type is not known
14/03/31 17:43:49 INFO job.GiraphJob:= run: Since checkpointing is disabled (default), do not allow any task retr= ies (setting mapred.map.max.attempts =3D 0, old value =3D 4)
14/03/31 17:43:50 INFO mapred.JobClient: R= unning job: job_201403311622_0002
14/03/31 17:43:51 INFO mapred.JobClient: =A0map 0% reduce 0%
14/03/31 17:44:08 INFO mapred.JobClie= nt: =A0map 50% reduce 0%
= 14/03/31 17:54:54 INFO mapred.JobClient: =A0map 0% reduce 0%
14/03/31 17:54:59 INFO mapred.JobClient: J= ob complete: job_201403311622_0002
14/03/31 17:54:59 INFO mapred.JobClie= nt: Counters: 6
14/03/31 = 17:54:59 INFO mapred.JobClient: =A0 Job Counters=A0
14/03/31 17:54:59 INFO mapred.JobClient: =A0 =A0= SLOTS_MILLIS_MAPS=3D656429
14/03/31 17:54:59 INFO mapred.JobClie= nt: =A0 =A0 Total time spent by all reduces waiting after reserving slots (= ms)=3D0
14/03/31 17:54:59= INFO mapred.JobClient: =A0 =A0 Total time spent by all maps waiting after = reserving slots (ms)=3D0
14/03/31 17:54:59 INFO mapred.JobClie= nt: =A0 =A0 Launched map tasks=3D2
14/03/31 17:54:59 INFO mapred.JobClient: =A0 =A0 SLOTS_MILLIS_RED= UCES=3D0
14/03/31 17:54:59 INFO mapred.JobClie= nt: =A0 =A0 Failed map tasks=3D1

Any ideas to why this happened? Do you think I need to update the hadoop I = am using?=A0

Kind= regards,=A0

Ghufran


On Mon, Mar= 31, 2014 at 5:11 PM, Young Han <young.han@uwaterloo.ca> wrote:
Hey,

=
Sure, I've uploaded the 1.0.0 classes I'm using:
http://pastebin.com/0cT= dWrR4
http://pastebin.= com/jWgVAzH6

They both go into giraph-examples/src/main/java/org/apache/giraph= /examples

Note that the input format it accepts is of the form= "src dst1 dst2 dst3 ..."---there is no vertex value. So your tes= t graph would be:

1=A02
2 1 3 4=A0
3 2
4 2

The command I'm using is:

hadoop jar "$GIRAPH_DIR"/g= iraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-depen= dencies.jar org.apache.giraph.GiraphRunner \
=A0=A0=A0 org.apache.giraph.examples.ConnectedComponentsVertex \
=A0=A0= =A0 -vif org.apache.giraph.examples.ConnectedComponentsInputFormat \
=A0= =A0=A0 -vip /user/${USER}/input/${inputgraph} \
=A0=A0=A0 -of org.apache= .giraph.io.formats.IdWithValueTextOutputFormat \
=A0=A0=A0 -op /user/${USER}/giraph-output/ \
=A0=A0=A0 -w 1

You'll want to change $GIRAPH_DIR, ${inputgraph}, and also the JA= R file name since you're using Hadoop 0.20.203.

Young


On Mon, Mar 31, 2014 at 12:00 PM, ghufran malik <g= hufran1malik@gmail.com> wrote:
Hi Young,=A0

=
I'd just like to say first thank you for your help it's much a= ppreciated!

I did the sanity check and everything seems fine I see = the correct results.

Yes I hadn't noticed that before that is strange, I don't know = how that happened as on the quick start guide (https://giraph.ap= ache.org/quick_start.html#qs_section_2) it says hadoop 0.20.203 was the= assumed default. I have both Giraph 1.1.0 and Giraph 1.0.0 and my Giraph 1= .0.0 is compiled to 0.20.203.=A0

I edited the code as you said for Giraph 1.1.0 but still received the s= ame error as before, so I thought it may be due to the hadoop version it wa= s compiled for. So I decided to try modify the code in Giraph 1.0.0 instead= , however since I do not have the correct input format class and the vertex= object is not instantiated in the ConnectedComponents class of Giraph 1.0.= 0, I was wondering if you could send me the full classes for both the Conne= ctedComponents class and the InputFormat so that I know code wise everythin= g should be correct.

I will be trying to implement the InputFormat class and ConnectedCompon= ents in the meantime and if I get it working before you respond I'll up= date this post.

Thanks =A0 =A0
<= div>
Ghufran.


On Sun, Mar 30, 2014 at 5:41 PM, Young Han <young.ha= n@uwaterloo.ca> wrote:
Hey,

As a=20 sanity check, is the graph really loaded on HDFS? Do you see the correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.txt"= ;?=20 (Where hadoop is your hadoop binary).

Also, I noticed tha= t your Giraph has been compiled for Hadoop 1.x, while the logs show Hadoop = 0.20.203.0. Maybe that could be the cause too?

Finally, this may be completely irrelevant, but I had issues running c= onnected components on Giraph 1.0.0 and I fixed it by changing the algorith= m and the input format. The input format you're using on 1.1.0 looks co= rrect to me. The algorithm change I did was to the first "if" blo= ck in ConnectedComponentsComputation:
    if (getSuperstep() =3D=3D 0) {
      currentComponent =3D vertex.=
getId().get();
      vertex.setValue(new IntWritable(currentComponent));

      sendMessageToAllEdges(=
vertex, vertex.getValue());

      vertex.voteToHalt();
      return;
    }
I forget what error this change= solved, so it may not help in your case.

Young



On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik <ghufran1malik@gmail.= com> wrote:
Hello,=A0

I am a final year = Bsc Computer Science Student who is using Apache Giraph for my final year p= roject and dissertation and would appreciate very much if someone could hel= p me with the following issue. =A0=A0

I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am h= aving trouble running the ConnectedComponents example. I use the following = command:

=A0hadoop jar /home/ghufran/Downloads/Giraph2/giraph/girap= h-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-= dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.= ConnectedComponentsComputation -vif org.apache.giraph.io.formats.IntIntNull= TextVertexInputFormat -vip /user/ghufran/in/my_graph.txt -vof org.apache.gi= raph.io.formats.IdWithValueTextOutputFormat -op /user/ghufran/outCC -w 1

I believe it gets stuck in the InputSuperstep as the following is d= isplayed in terminal when the command is running:

14/03/30 10:4= 8:46 INFO mapred.JobClient: =A0map 100% reduce 0%
14/03/30 10:48:= 50 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 verti= ces loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input spli= ts loaded; min free memory on worker 1 - 109.01MB, average 109.01MB
14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - L= oading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loade= d, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, aver= age 109.01MB
14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - Loadin= g data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 = edge input splits loaded; min free memory on worker 1 - 108.78MB, average 1= 08.78MB =A0
....

which I traced back to the following if = statement in the toString() method of core.org.apache.job.CombinedWorkerPro= gress:

if (isInputSuperstep()) {
=A0 =A0 =A0 sb.appe= nd("Loading data: ");
=A0 =A0 =A0 sb.append(verticesLoaded).append(" vertices loaded, &= quot;);
=A0 =A0 =A0 sb.append(vertexInputSplitsLoaded).append(
=A0 =A0 =A0 =A0 =A0 " vertex input splits loaded; ");
=A0 =A0 =A0 sb.append(edgesLoaded).append(" edges loaded, "= ;);
=A0 =A0 =A0 sb.append(edgeInputSplitsLoaded).append(" edge input = splits loaded");

sb.append("; min free m= emory on worker ").append(
=A0 =A0 =A0 =A0 workerWithMinFree= Memory).append(" - ").append(
=A0 =A0 =A0 =A0 DECIMAL_FORMAT.format(minFreeMemoryMB)).append("M= B, average ").append(
=A0 =A0 =A0 =A0 DECIMAL_FORMAT.format(= freeMemoryMB)).append("MB");
=A0=A0
So it seems to= me that it's not loading in the InputFormat correctly. So I am assumin= g there's something wrong with my input format class or, probably more = likely, something wrong with the graph I passed in?

I pass in a small graph that has the format vertex id, vertex value, ne= ighbours separated by tabs, my graph is shown below:=A0

1 0 2
2 1 1 3 4=A0
3 2 2
4 3 2 =A0=A0

The full outp= ut is shown below after I ran my command is shown below. If anyone could ex= plain to me why I am not getting the expected output I would greatly apprec= iate it.=A0

Many thanks,=A0

Ghufran


FULL OUTPUT:


14/03/30 1= 0:48:06 INFO utils.ConfigurationUtils: No edge input format specified. Ensu= re your InputFormat does not require one.
14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format= specified. Ensure your OutputFormat does not require one.
14/03/= 30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is disabled (defau= lt), do not allow any task retries (setting mapred.map.max.attempts =3D 0, = old value =3D 4)
14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL: http://ghufran:50030/jobdetails.jsp?jobid=3Djob_201403301044_0001
14/03/30 10:48:45 INFO job.HaltApplicationUtils$DefaultHaltInstruction= sWriter: writeHaltInstructions: To halt after next superstep execute: '= bin/halt-application --zkServer ghufran:22181 --zkNode /_hadoopBsp/job_2014= 03301044_0001/_haltComputation'
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:zookeep= er.version=3D3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.ve= rsion=3D1.7.0_51
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Clie= nt environment:java.vendor=3DOracle Corporation
14/03/30 10:48:45= INFO zookeeper.ZooKeeper: Client environment:java.home=3D/usr/lib/jvm/java= -7-oracle/jre
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.cl= ass.path=3D/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/j= vm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bi= n/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.= 0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.= jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5= .jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils= -1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-bea= nutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/= commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/co= mmons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/co= mmons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/.= ./lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203= .0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.= 203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.= 20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.= 203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoo= p-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop= -0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/ha= doop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Down= loads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downl= oads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downl= oads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/ha= doop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hado= op-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads= /hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Do= wnloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufr= an/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/g= hufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufra= n/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Dow= nloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Dow= nloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads= /hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-= 0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203= .0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bi= n/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bi= n/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib= /servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin= /../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/.= ./lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin= /../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../li= b/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/= jsp-2.1/jsp-api-2.1.jar
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.li= brary.path=3D/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Li= nux-amd64-64
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client e= nvironment:java.io.tmpdir=3D/tmp
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.co= mpiler=3D<NA>
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: C= lient environment:os.name= =3DLinux
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.arch= =3Damd64
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.vers= ion=3D3.8.0-35-generic
14/03/30 10:48:45 INFO zookeeper.ZooKeeper= : Client environment:user.na= me=3Dghufran
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.ho= me=3D/home/ghufran
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Cl= ient environment:user.dir=3D/home/ghufran/Downloads/hadoop-0.20.203.0/bin
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client connecti= on, connectString=3Dghufran:22181 sessionTimeout=3D60000 watcher=3Dorg.apac= he.giraph.job.JobProgressTracker@209fa588
14/03/30 10:48:45 INFO = mapred.JobClient: Running job: job_201403301044_0001
14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection= to server ghufran/127= .0.1.1:22181. Will not attempt to authenticate using SASL (unknown erro= r)
14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection establi= shed to ghufran/127.0.= 1.1:22181, initiating session
14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment com= plete on server ghufran/127.0.1.1:22181, sessionid =3D 0x1451263c44c0002, negotiated timeout = =3D 600000
14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers - Loadin= g data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 = edge input splits loaded; min free memory on worker 1 - 109.01MB, average 1= 09.01MB
14/03/30 10:48:46 INFO mapred.JobClient: =A0map 100% reduce 0%
14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - Loa= ding data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded,= 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, averag= e 109.01MB
14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - L= oading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loade= d, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, aver= age 109.01MB
14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - L= oading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loade= d, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB, aver= age 108.78MB






--001a11c248e0d443a604f5ea446e--