giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mirko Kämpf <mirko.kae...@cloudera.com>
Subject Re: Sample data for Single Source shortest path
Date Sat, 01 Mar 2014 17:13:14 GMT
Here is my work log with some steps I need to prep for building Giraph:

Requires Maven 3.x

mvn -version

Install JDK 1.7

http://www.if-not-true-then-false.com/2010/install-sun-oracle-java-jdk-jre-7-on-fedora-centos-red-hat-rhel/

## java ##
sudo alternatives --install /usr/bin/java java
/usr/java/jdk1.7.0_51/jre/bin/java 200000

## javaws ##
sudo alternatives --install /usr/bin/javaws javaws
/usr/java/jdk1.7.0_51/jre/bin/javaws 200000

## Java Browser (Mozilla) Plugin 32-bit ##
sudo alternatives --install /usr/lib/mozilla/plugins/libjavaplugin.so
libjavaplugin.so /usr/java/jdk1.7.0_51/jre/lib/i386/libnpjp2.so 200000

## Java Browser (Mozilla) Plugin 64-bit ##
sudo alternatives --install /usr/lib64/mozilla/plugins/libjavaplugin.so
libjavaplugin.so.x86_64 /usr/java/jdk1.7.0_51/jre/lib/amd64/libnpjp2.so
200000

## Install javac only if you installed JDK (Java Development Kit) package ##
sudo alternatives --install /usr/bin/javac javac
/usr/java/jdk1.7.0_51/bin/javac 200000
sudo alternatives --install /usr/bin/jar jar /usr/java/jdk1.7.0_51/bin/jar
200000

Check JDK

export JAVA_HOME="/usr/java/jdk1.7.0_51"

             java -version

Checkout sources

git clone https://git-wip-us.apache.org/repos/asf/giraph.git<http://git-wip-us.apache.org/repos/asf/giraph.git>


Apply the last version of the unmerged DOCU - patch

wget
https://issues.apache.org/jira/secure/attachment/12630040/GIRAPH-849.v3.patch

git apply --stat
GIRAPH-849.v3.patch<https://issues.apache.org/jira/secure/attachment/12630040/GIRAPH-849.v3.patch>

git apply --check
GIRAPH-849.v3.patch<https://issues.apache.org/jira/secure/attachment/12630040/GIRAPH-849.v3.patch>



Build Giraph

mvn -Phadoop_2 -fae -DskipTests clean install

mvn -Phadoop_2 -DskipTests -Ddependency.locations.enabled=false site

mvn -Phadoop_2 -DskipTests site:stage

Do some cool work on doc and code ... ;-)


Grep for some code:

grep -r --include="*.java" WHAT WHERE


Create the patch and submit it to JIRA and to the Review Board

http://ariejan.net/2009/10/26/how-to-create-and-apply-a-patch-with-git/

git diff --no-prefix trunk > GIRAPH-{ISSUE_NUMBER}.patch



You can skip the yello parts ... and maybe you need another profile, but I
just use hadoop_2 right now.

Good luck!
MK



On Sat, Mar 1, 2014 at 5:57 PM, Jyoti Yadav <rao.jyoti26yadav@gmail.com>wrote:

> Hi Mirko..
>
> Thanks for your reply.. All MapReduce programs are running fine on this
> system.
>  And it  is yarn setup.
>
> Please guide me how to bulid giraph with this hadoop version..Should I
> need to install external zookeeper also.?
>
> Thanks in advance..
>
> Jyoti
>
>
> On Sat, Mar 1, 2014 at 6:31 PM, Mirko Kämpf <mirko.kaempf@cloudera.com>wrote:
>
>> Hello,
>>
>> if you build Giraph for hadoop 0.20.... the same jars will not work for
>> hadoop version 2.2.0.
>> Right now I build the profile -Phadoop_2 from curren the 1.1. branch in
>> the git repo.
>>
>> How many nodes (physical servers or VMs) do you run on your 64 core
>> system?
>> What distro of Hadoop are working with? and is it a MRv1 or MRV2 (YARN)
>> setup?
>>
>> Is your MapReduce system working properly ... can you run TerraSort for
>> example?
>>
>> Cheers,
>> Mirko
>>
>>
>>
>> On Sat, Mar 1, 2014 at 4:15 AM, Jyoti Yadav <rao.jyoti26yadav@gmail.com>wrote:
>>
>>> Anyone please reply ..Is it portability problem??.. Does giraph has any
>>> issues with Hadoop 2.2.0??
>>>
>>> Do I need to build Giraph on the new system ??
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Sat, Mar 1, 2014 at 2:28 PM, Jyoti Yadav <rao.jyoti26yadav@gmail.com>wrote:
>>>
>>>> Hi Sebastian..
>>>> Thanks for the links given  for big graphs..
>>>>
>>>> Actually I want to tell you something about problem i am facing.
>>>>
>>>> Initially I was working with *hadoop 0.20.203* . I build Giraph
>>>> there.. it was running fine.
>>>>
>>>> Now  to test very big graph related problem and to compare the
>>>> performance , I moved to new system which is  of 64 cores and 512 GB memory
>>>> and  3 TB storage.  Instead to building Giraph in the new system, I just
>>>> copied Giraph folder from my previous system to this new system. In this
>>>> new system *hadoop version 2.2..0 * . I tried to execute
>>>> SimpleSourceShortestPath algo on sample data set. It is throwing following
>>>> exception.
>>>>
>>>> I gave following command to execute the job.
>>>>
>>>> hadoop jar
>>>> /home/abcd2014/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>>> org.apache.giraph.GiraphRunner -Dgiraph.SplitMasterWorker=true
>>>> org.apache.giraph.examples.SimpleShortestPathsComputation -vif
>>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>> -vip /user/abcd2014/giraph_input/tiny_graph.txt -vof
>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>>> /user/abcd2014/output2/shortestpaths -w 1
>>>>
>>>>
>>>>
>>>> 14/03/01 12:44:46 INFO utils.ConfigurationUtils: No edge input format
>>>> specified. Ensure your InputFormat does not require one.
>>>> 14/03/01 12:44:46 INFO utils.ConfigurationUtils: No edge output format
>>>> specified. Ensure your OutputFormat does not require one.
>>>> 14/03/01 12:44:46 INFO Configuration.deprecation:
>>>> mapreduce.job.counters.limit is deprecated. Instead, use
>>>> mapreduce.job.counters.max
>>>> 14/03/01 12:44:46 INFO Configuration.deprecation:
>>>> mapred.job.map.memory.mb is deprecated. Instead, use mapreduce.map.memory.mb
>>>> 14/03/01 12:44:46 INFO Configuration.deprecation:
>>>> mapred.job.reduce.memory.mb is deprecated. Instead, use
>>>> mapreduce.reduce.memory.mb
>>>> 14/03/01 12:44:46 INFO Configuration.deprecation:
>>>> mapred.map.tasks.speculative.execution is deprecated. Instead, use
>>>> mapreduce.map.speculative
>>>> 14/03/01 12:44:46 INFO Configuration.deprecation:
>>>> mapreduce.user.classpath.first is deprecated. Instead, use
>>>> mapreduce.job.user.classpath.first
>>>> 14/03/01 12:44:46 INFO Configuration.deprecation:
>>>> mapred.map.max.attempts is deprecated. Instead, use
>>>> mapreduce.map.maxattempts
>>>> 14/03/01 12:44:46 INFO job.GiraphJob: run: Since checkpointing is
>>>> disabled (default), do not allow any task retries (setting
>>>> mapred.map.max.attempts = 0, old value = 4)
>>>> 14/03/01 12:44:46 INFO Configuration.deprecation: mapred.job.tracker is
>>>> deprecated. Instead, use mapreduce.jobtracker.address
>>>>
>>>> *Exception in thread "main" java.lang.IllegalArgumentException:
>>>> checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run
>>>> in split master / worker mode since there is only 1 task at a time! *
>>>> at
>>>> org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:165)
>>>>     at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:233)
>>>>     at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94)
>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>>>     at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>>>>
>>>>
>>>>
>>>> Would you suggest me something to fix this...If you need any details
>>>> further,please let me know...
>>>>
>>>> Thanks & Regards
>>>>
>>>> Jyoti
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Mar 1, 2014 at 1:35 PM, Sebastian Schelter <ssc@apache.org>wrote:
>>>>
>>>>> Hi Jyoti,
>>>>>
>>>>> You can find a couple of very large graphs in KONECT [1] and on the
>>>>> website of the laboratory for web algorithmics from the University of
Milan
>>>>> [2]. You will probably have to convert them to an appropriate format
for
>>>>> Giraph.
>>>>>
>>>>> Best,
>>>>> Sebastian
>>>>>
>>>>> [1] http://konect.uni-koblenz.de/
>>>>> [2] http://law.di.unimi.it/datasets.php
>>>>>
>>>>>
>>>>> On 03/01/2014 05:22 AM, Jyoti Yadav wrote:
>>>>>
>>>>>> Hi folks..
>>>>>>
>>>>>> I got new system which is  of 64 cores and 512 GB memory and  3 TB
>>>>>> storage.I want to test the performance of Giraph on this system.
>>>>>>   Would anyone provide me the link for very large graph  so that
I can
>>>>>> execute Single Source Shortest Path Example. For this algo to run
>>>>>> graph
>>>>>> should be weighted graph. and  to feed it into the Giraph -input
>>>>>> format is
>>>>>> JsonLongDoubleFloatDouble
>>>>>>
>>>>>> Thanks in advance...
>>>>>> With Regards
>>>>>>
>>>>>> Jyoti
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> --
>> Mirko Kämpf
>>
>> *Trainer* @ Cloudera
>>
>> tel: +49 *176 20 63 51 99*
>> skype: *kamir1604*
>> mirko@cloudera.com
>>
>>
>


-- 
-- 
Mirko Kämpf

*Trainer* @ Cloudera

tel: +49 *176 20 63 51 99*
skype: *kamir1604*
mirko@cloudera.com

Mime
View raw message