giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sundara Raghavan Sankaran <sun...@crayondata.com>
Subject RE: Running ConnectedComponents in a cluster.
Date Thu, 17 Apr 2014 18:37:31 GMT
I'd like to know how the directed graph is converted to an undirected
graph. Do we just create an edge in the other direction for every edge
created or is there some other way?
On Apr 17, 2014 9:32 PM, "Yu, Jaewook" <jaewook.yu@intel.com> wrote:

>  Ghufran,
>
>
>
> It looks like the graph loading is failing from your log:
>
>
>
> 14/04/17 16:12:31 INFO job.JobProgressTracker: Data from 3 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 2 - 141.96MB,
> average 142.66MB
>
>
>
> If you have access to JobTracker web interface (port 50030) or you know
> where the log files are located, take a look at the log for this failing
> job. That would be a good starting point to debug the issue.
>
>
>
> Thanks,
>
> Jae
>
>
>
> *From:* ghufran malik [mailto:ghufran1malik@gmail.com]
> *Sent:* Thursday, April 17, 2014 8:28 AM
> *To:* user@giraph.apache.org
> *Subject:* Re: Running ConnectedComponents in a cluster.
>
>
>
> I would appreciate it if you could lend me your assistance with another
> problem of mine.
>
> I have an implementation of TriangleCounting algorithm that runs correctly
> on the smaller dataset I used to test ConnectedComponents, but fails when
> trying to compute this larger dataset.
>
> the map seems to fail and I do not know why. Full output is below.
>
> 14/04/17 16:12:30 INFO
> job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
> writeHaltInstructions: To halt after next superstep execute:
> 'bin/halt-application --zkServer ricotta.eecs.qmul.ac.uk:2181 --zkNode
> /_hadoopBsp/job_1381849812331_2770/_haltComputation'
> 14/04/17 16:12:31 INFO mapreduce.Job: Running job: job_1381849812331_2770
> 14/04/17 16:12:31 INFO job.JobProgressTracker: Data from 3 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 2 - 141.96MB,
> average 142.66MB
> 14/04/17 16:12:32 INFO mapreduce.Job: Job job_1381849812331_2770 running
> in uber mode : false
> 14/04/17 16:12:32 INFO mapreduce.Job:  map 100% reduce 0%
> 14/04/17 16:12:36 INFO job.JobProgressTracker: Data from 3 workers -
> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges
> loaded, 0 edge input splits loaded; min free memory on worker 2 - 141.96MB,
> average 142.66MB
> 14/04/17 16:12:41 INFO job.JobProgressTracker: Data from 1 workers -
> Compute superstep 1: 0 out of 378222 vertices computed; 0 out of 3
> partitions computed; min free memory on worker 2 - 24.77MB, average 103.6MB
> 14/04/17 16:12:46 INFO job.JobProgressTracker: Data from 3 workers -
> Compute superstep 1: 0 out of 1134723 vertices computed; 0 out of 9
> partitions computed; min free memory on worker 1 - 22.5MB, average 23.36MB
> 14/04/17 16:12:48 INFO mapreduce.Job: Job job_1381849812331_2770 failed
> with state FAILED due to: Task failed task_1381849812331_2770_m_000002
> Job failed as tasks failed. failedMaps:1 failedReduces:0
>
> 14/04/17 16:12:48 INFO mapreduce.Job: Counters: 46
>     File System Counters
>         FILE: Number of bytes read=0
>         FILE: Number of bytes written=143668
>         FILE: Number of read operations=0
>         FILE: Number of large read operations=0
>         FILE: Number of write operations=0
>         HDFS: Number of bytes read=37028489
>         HDFS: Number of bytes written=0
>         HDFS: Number of read operations=3
>         HDFS: Number of large read operations=0
>         HDFS: Number of write operations=0
>     Job Counters
>         Failed map tasks=1
>         Launched map tasks=3
>         Other local map tasks=3
>         Total time spent by all maps in occupied slots (ms)=24219
>         Total time spent by all reduces in occupied slots (ms)=0
>     Map-Reduce Framework
>         Map input records=2
>         Map output records=0
>         Input split bytes=88
>         Spilled Records=0
>         Failed Shuffles=0
>         Merged Map outputs=0
>         GC time elapsed (ms)=22209
>         CPU time spent (ms)=77200
>         Physical memory (bytes) snapshot=659660800
>         Virtual memory (bytes) snapshot=1657229312
>         Total committed heap usage (bytes)=372899840
>     Giraph Stats
>         Aggregate edges=0
>         Aggregate finished vertices=0
>         Aggregate sent message message bytes=0
>         Aggregate sent messages=0
>         Aggregate vertices=0
>         Current master task partition=0
>         Current workers=0
>         Last checkpointed superstep=0
>         Sent message bytes=0
>         Sent messages=0
>         Superstep=0
>     Giraph Timers
>         Initialize (ms)=0
>         Setup (ms)=0
>         Shutdown (ms)=0
>         Total (ms)=0
>     Zookeeper base path
>         /_hadoopBsp/job_1381849812331_2770=0
>     Zookeeper halt node
>         /_hadoopBsp/job_1381849812331_2770/_haltComputation=0
>     Zookeeper server:port
>         ricotta.eecs.qmul.ac.uk:2181=0
>     File Input Format Counters
>         Bytes Read=0
>     File Output Format Counters
>         Bytes Written=0
>
> Thanks,
>
> Ghufran
>
>
>
> On Thu, Apr 17, 2014 at 4:21 PM, ghufran malik <ghufran1malik@gmail.com>
> wrote:
>
> oh woops! yes i meant i change it to an undirected format!
>
>
>
> On Thu, Apr 17, 2014 at 4:11 PM, ghufran malik <ghufran1malik@gmail.com>
> wrote:
>
> Hi Jae,
>
> Thanks so much for pointing out that it wasn't directed. I made the
> changes and made a directed graph and connected components now works :)
>
> Thanks,
>
> Ghufran
>
>
>
> On Wed, Apr 16, 2014 at 7:31 PM, Yu, Jaewook <jaewook.yu@intel.com> wrote:
>
> Ghufran,
>
>
>
> The Youtube community dataset (com-youtube.ungraph.txt.gz<https://snap.stanford.edu/data/bigdata/communities/com-youtube.ungraph.txt.gz>)
> [1] is formatted as directed graph although the description says it’s
> undirected graph. With some minor changes in your conversion program, you
> should be able to generated a proper undirected adjacency list.
>
>
>
> Hope this will help.
>
>
>
> Thanks,
>
> Jae
>
>
>
> [1] https://snap.stanford.edu/data/com-Youtube.html
>
>
>
> *From:* Yu, Jaewook [mailto:jaewook.yu@intel.com]
> *Sent:* Wednesday, April 16, 2014 11:00 AM
> *To:* user@giraph.apache.org
> *Subject:* RE: Running ConnectedComponents in a cluster.
>
>
>
> Hi Ghufran,
>
>
>
> Have you verified the neighbors of each vertex actually exist? From your
> adjacency list, for example, 278447 278447 532613, is the neighbor’s vertex
> id 532613 valid?
>
>
>
> Thanks,
>
> Jae
>
>
>
>
>
> *From:* ghufran malik [mailto:ghufran1malik@gmail.com<ghufran1malik@gmail.com>]
>
> *Sent:* Wednesday, April 16, 2014 9:22 AM
> *To:* user@giraph.apache.org
> *Subject:* Running ConnectedComponents in a cluster.
>
>
>
> Hi,
>
> I have setup Giraph on my university cluster of computers (Giraph
> 1.1.0-SNAPSHOT-for-hadoop-2.0.0-cdh4.3.1). I've successfully ran the
> connected components algorithm on a very small test dataset using 4 workers
> and it produced the expected output.
>
>
> dataset:
>
> vertex id, vertex value, neighbours....
>
> 0 0 1
> 1 1 0 2 3
> 2 2 1 3
> 3 3 1 2
>
> output:
> 1    0
> 0    0
> 3    0
> 2    0
>
>
>
> However when I tried to run this algorithm on a larger dataset
> (reformatted version of com-youtube.ungraph from Stanford snap to match the
> IntIntNullTextVertexInputFormat) it successfully complets but the incorrect
> output is produced. It seems to just output the vertex id with its orignal
> value (its vertex id is its original value that i set).
>
> A snippet of the dataset is provided:
>
> vertex id, vertex value, neighbours....
> .......
> 278447 278447 532613
> 278449 278449 305447 324115 414238
> 83899 83899 153460 172614 176613 211448
> 773749 773749 845366
> 773748 773748 960388
> .......
>
> output produced:
> .............
> 73132    73132
> 831308    831308
> 199788    199788
> 763644    763644
> 300572    300572
> .............
>
> there's not one vertex value that isn't the same as its original vertex
> ID.
>
> The computation also stops after superstep 0 is done and goes no further,
> whereas on my smaller data set completes 3 supersteps.
>
> Does anyone have an idea to why this is?
>
> Kind regards,
>
> Ghufran
>
>
>
>
>
>
>
>
>

-- 

------------------------------

<http://crayondata.com/?utm_source=emailsig>      <https://www.facebook.com/crayondata><https://twitter.com/CrayonBigData><http://www.linkedin.com/company/crayon-data><https://plus.google.com/+Crayondata1><http://www.youtube.com/user/crayonbigdata>
www.crayondata.com <http://crayondata.com/?utm_source=emailsig>

<http://bigdata-madesimple.com/?utm_source=emailsig>
www.bigdata-madesimple.com<http://bigdata-madesimple.com/?utm_source=emailsig>
------------------------------

 Finalist<http://www.code-n.org/fileadmin/user_upload/pdf/131210_List_Top_50_EN.pdf>
at 
the Code_N 2014 Contest <http://www.code-n.org/cebit/award/> at CEBIT<http://www.cebit.com/>,

Hanover - the only big data company from Asia. 


This email and its contents are confidential, and meant only for you. Views 
or opinions, presented in this email, are solely of the author and may not 
necessarily represent Crayon Data.

Mime
View raw message