giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young Han <young....@uwaterloo.ca>
Subject Re: ConnectedComponents example
Date Mon, 31 Mar 2014 17:20:32 GMT
Hmm.. it looks like a failure during graph loading. Did you forget a .txt
in the input path?

Young


On Mon, Mar 31, 2014 at 1:17 PM, ghufran malik <ghufran1malik@gmail.com>wrote:

> Hi,
>
> Thanks for the speedy response!
>
> It didn't work for me :(.
>
> I updated the ConnectComponentsVertex class with yours and added in the
> new ConnectedComponentsInputFormat class. They are both in the
> giraph-examples/src/main/java/org/apache/giraph/examples package.
> To compile the example package:
> I cd'd to ~/Downloads/giraph-folder/giraph-1.0.0/giraph-examples
> and then typed "mvn compile" which resulted in BUILD SUCCESS. As a sanity
> check I checked the jar to make sure it had the
> ConnectedComponentsInputFormat class in it, and it did.
>
> I then updated my graph by taking out the vertex values so in the end I
> had:
>
>
> 1 2
> 2 1 3 4
> 3 2
> 4 2
>
> where the numbers are separated out by tab space ([\t]).
>
> The command I ran was:
>
> hadoop jar
> /home/ghufran/Downloads/giraph-folder/giraph-1.0.0/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.ConnectedComponentsVertex -vif
> org.apache.giraph.examples.ConnectedComponentsInputFormat -vip
> /user/ghufran/input/my_graph -of
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/ghufran/giraph-output -w 1
>
>
> but I ended up with the output:
>
> 14/03/31 17:43:49 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format
> vertex index type is not known
> 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format
> vertex value type is not known
> 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format
> edge value type is not known
> 14/03/31 17:43:49 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 0, old value = 4)
> 14/03/31 17:43:50 INFO mapred.JobClient: Running job: job_201403311622_0002
> 14/03/31 17:43:51 INFO mapred.JobClient:  map 0% reduce 0%
> 14/03/31 17:44:08 INFO mapred.JobClient:  map 50% reduce 0%
> 14/03/31 17:54:54 INFO mapred.JobClient:  map 0% reduce 0%
> 14/03/31 17:54:59 INFO mapred.JobClient: Job complete:
> job_201403311622_0002
> 14/03/31 17:54:59 INFO mapred.JobClient: Counters: 6
> 14/03/31 17:54:59 INFO mapred.JobClient:   Job Counters
> 14/03/31 17:54:59 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=656429
> 14/03/31 17:54:59 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 14/03/31 17:54:59 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/03/31 17:54:59 INFO mapred.JobClient:     Launched map tasks=2
> 14/03/31 17:54:59 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 14/03/31 17:54:59 INFO mapred.JobClient:     Failed map tasks=1
>
> Any ideas to why this happened? Do you think I need to update the hadoop I
> am using?
>
> Kind regards,
>
> Ghufran
>
>
> On Mon, Mar 31, 2014 at 5:11 PM, Young Han <young.han@uwaterloo.ca> wrote:
>
>> Hey,
>>
>> Sure, I've uploaded the 1.0.0 classes I'm using:
>> http://pastebin.com/0cTdWrR4
>> http://pastebin.com/jWgVAzH6
>>
>> They both go into giraph-examples/src/main/java/org/apache/giraph/examples
>>
>> Note that the input format it accepts is of the form "src dst1 dst2 dst3
>> ..."---there is no vertex value. So your test graph would be:
>>
>> 1 2
>> 2 1 3 4
>> 3 2
>> 4 2
>>
>> The command I'm using is:
>>
>> hadoop jar
>> "$GIRAPH_DIR"/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-dependencies.jar
>> org.apache.giraph.GiraphRunner \
>>     org.apache.giraph.examples.ConnectedComponentsVertex \
>>     -vif org.apache.giraph.examples.ConnectedComponentsInputFormat \
>>     -vip /user/${USER}/input/${inputgraph} \
>>     -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
>>     -op /user/${USER}/giraph-output/ \
>>     -w 1
>>
>> You'll want to change $GIRAPH_DIR, ${inputgraph}, and also the JAR file
>> name since you're using Hadoop 0.20.203.
>>
>> Young
>>
>>
>> On Mon, Mar 31, 2014 at 12:00 PM, ghufran malik <ghufran1malik@gmail.com>wrote:
>>
>>> Hi Young,
>>>
>>> I'd just like to say first thank you for your help it's much appreciated!
>>>
>>> I did the sanity check and everything seems fine I see the correct
>>> results.
>>>
>>> Yes I hadn't noticed that before that is strange, I don't know how that
>>> happened as on the quick start guide (
>>> https://giraph.apache.org/quick_start.html#qs_section_2) it says hadoop
>>> 0.20.203 was the assumed default. I have both Giraph 1.1.0 and Giraph 1.0.0
>>> and my Giraph 1.0.0 is compiled to 0.20.203.
>>>
>>> I edited the code as you said for Giraph 1.1.0 but still received the
>>> same error as before, so I thought it may be due to the hadoop version it
>>> was compiled for. So I decided to try modify the code in Giraph 1.0.0
>>> instead, however since I do not have the correct input format class and the
>>> vertex object is not instantiated in the ConnectedComponents class of
>>> Giraph 1.0.0, I was wondering if you could send me the full classes for
>>> both the ConnectedComponents class and the InputFormat so that I know code
>>> wise everything should be correct.
>>>
>>> I will be trying to implement the InputFormat class and
>>> ConnectedComponents in the meantime and if I get it working before you
>>> respond I'll update this post.
>>>
>>> Thanks
>>>
>>> Ghufran.
>>>
>>>
>>> On Sun, Mar 30, 2014 at 5:41 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>
>>>> Hey,
>>>>
>>>> As a sanity check, is the graph really loaded on HDFS? Do you see the
>>>> correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.txt"?
>>>> (Where hadoop is your hadoop binary).
>>>>
>>>> Also, I noticed that your Giraph has been compiled for Hadoop 1.x,
>>>> while the logs show Hadoop 0.20.203.0. Maybe that could be the cause too?
>>>>
>>>> Finally, this may be completely irrelevant, but I had issues running
>>>> connected components on Giraph 1.0.0 and I fixed it by changing the
>>>> algorithm and the input format. The input format you're using on 1.1.0
>>>> looks correct to me. The algorithm change I did was to the first "if" block
>>>> in ConnectedComponentsComputation:
>>>>
>>>>     if (getSuperstep() == 0) {      currentComponent = vertex.getId().get();
     vertex.setValue(new IntWritable(currentComponent));      sendMessageToAllEdges(vertex,
vertex.getValue());      vertex.voteToHalt();      return;    }
>>>>
>>>> I forget what error this change solved, so it may not help in your case.
>>>>
>>>> Young
>>>>
>>>>
>>>>
>>>> On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik <ghufran1malik@gmail.com
>>>> > wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am a final year Bsc Computer Science Student who is using Apache
>>>>> Giraph for my final year project and dissertation and would appreciate
very
>>>>> much if someone could help me with the following issue.
>>>>>
>>>>> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am
>>>>> having trouble running the ConnectedComponents example. I use the following
>>>>> command:
>>>>>
>>>>>  hadoop jar
>>>>> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>>>>> org.apache.giraph.GiraphRunner
>>>>> org.apache.giraph.examples.ConnectedComponentsComputation -vif
>>>>> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip
>>>>> /user/ghufran/in/my_graph.txt -vof
>>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>>>> /user/ghufran/outCC -w 1
>>>>>
>>>>>
>>>>> I believe it gets stuck in the InputSuperstep as the following is
>>>>> displayed in terminal when the command is running:
>>>>>
>>>>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>>>>> average 108.78MB
>>>>> ....
>>>>>
>>>>> which I traced back to the following if statement in the toString()
>>>>> method of core.org.apache.job.CombinedWorkerProgress:
>>>>>
>>>>> if (isInputSuperstep()) {
>>>>>       sb.append("Loading data: ");
>>>>>       sb.append(verticesLoaded).append(" vertices loaded, ");
>>>>>       sb.append(vertexInputSplitsLoaded).append(
>>>>>           " vertex input splits loaded; ");
>>>>>       sb.append(edgesLoaded).append(" edges loaded, ");
>>>>>       sb.append(edgeInputSplitsLoaded).append(" edge input splits
>>>>> loaded");
>>>>>
>>>>> sb.append("; min free memory on worker ").append(
>>>>>         workerWithMinFreeMemory).append(" - ").append(
>>>>>         DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average
>>>>> ").append(
>>>>>         DECIMAL_FORMAT.format(freeMemoryMB)).append("MB");
>>>>>
>>>>> So it seems to me that it's not loading in the InputFormat correctly.
>>>>> So I am assuming there's something wrong with my input format class or,
>>>>> probably more likely, something wrong with the graph I passed in?
>>>>>
>>>>> I pass in a small graph that has the format vertex id, vertex value,
>>>>> neighbours separated by tabs, my graph is shown below:
>>>>>
>>>>> 1 0 2
>>>>> 2 1 1 3 4
>>>>> 3 2 2
>>>>> 4 3 2
>>>>>
>>>>> The full output is shown below after I ran my command is shown below.
>>>>> If anyone could explain to me why I am not getting the expected output
I
>>>>> would greatly appreciate it.
>>>>>
>>>>> Many thanks,
>>>>>
>>>>> Ghufran
>>>>>
>>>>>
>>>>> FULL OUTPUT:
>>>>>
>>>>>
>>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format
>>>>> specified. Ensure your InputFormat does not require one.
>>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format
>>>>> specified. Ensure your OutputFormat does not require one.
>>>>> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is
>>>>> disabled (default), do not allow any task retries (setting
>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL:
>>>>> http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001
>>>>> 14/03/30 10:48:45 INFO
>>>>> job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
>>>>> writeHaltInstructions: To halt after next superstep execute:
>>>>> 'bin/halt-application --zkServer ghufran:22181 --zkNode
>>>>> /_hadoopBsp/job_201403301044_0001/_haltComputation'
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52
GMT
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:
>>>>> host.name=ghufran
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.version=1.7.0_51
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.vendor=Oracle Corporation
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.io.tmpdir=/tmp
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:java.compiler=<NA>
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name
>>>>> =Linux
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:os.arch=amd64
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:os.version=3.8.0-35-generic
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:
>>>>> user.name=ghufran
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:user.home=/home/ghufran
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client
>>>>> environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin
>>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client
>>>>> connection, connectString=ghufran:22181 sessionTimeout=60000
>>>>> watcher=org.apache.giraph.job.JobProgressTracker@209fa588
>>>>> 14/03/30 10:48:45 INFO mapred.JobClient: Running job:
>>>>> job_201403301044_0001
>>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection
>>>>> to server ghufran/127.0.1.1:22181. Will not attempt to authenticate
>>>>> using SASL (unknown error)
>>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection
>>>>> established to ghufran/127.0.1.1:22181, initiating session
>>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment
>>>>> complete on server ghufran/127.0.1.1:22181, sessionid =
>>>>> 0x1451263c44c0002, negotiated timeout = 600000
>>>>>  14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB,
>>>>> average 109.01MB
>>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers -
>>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
>>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB,
>>>>> average 108.78MB
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message