giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ghufran malik <ghufran1ma...@gmail.com>
Subject Re: Running ConnectedComponents in a cluster.
Date Thu, 17 Apr 2014 15:11:59 GMT
Hi Jae,

Thanks so much for pointing out that it wasn't directed. I made the changes
and made a directed graph and connected components now works :)

Thanks,
Ghufran


On Wed, Apr 16, 2014 at 7:31 PM, Yu, Jaewook <jaewook.yu@intel.com> wrote:

>  Ghufran,
>
>
>
> The Youtube community dataset (com-youtube.ungraph.txt.gz<https://snap.stanford.edu/data/bigdata/communities/com-youtube.ungraph.txt.gz>)
> [1] is formatted as directed graph although the description says it’s
> undirected graph. With some minor changes in your conversion program, you
> should be able to generated a proper undirected adjacency list.
>
>
>
> Hope this will help.
>
>
>
> Thanks,
>
> Jae
>
>
>
> [1] https://snap.stanford.edu/data/com-Youtube.html
>
>
>
> *From:* Yu, Jaewook [mailto:jaewook.yu@intel.com]
> *Sent:* Wednesday, April 16, 2014 11:00 AM
> *To:* user@giraph.apache.org
> *Subject:* RE: Running ConnectedComponents in a cluster.
>
>
>
> Hi Ghufran,
>
>
>
> Have you verified the neighbors of each vertex actually exist? From your
> adjacency list, for example, 278447 278447 532613, is the neighbor’s vertex
> id 532613 valid?
>
>
>
> Thanks,
>
> Jae
>
>
>
>
>
> *From:* ghufran malik [mailto:ghufran1malik@gmail.com<ghufran1malik@gmail.com>]
>
> *Sent:* Wednesday, April 16, 2014 9:22 AM
> *To:* user@giraph.apache.org
> *Subject:* Running ConnectedComponents in a cluster.
>
>
>
> Hi,
>
> I have setup Giraph on my university cluster of computers (Giraph
> 1.1.0-SNAPSHOT-for-hadoop-2.0.0-cdh4.3.1). I've successfully ran the
> connected components algorithm on a very small test dataset using 4 workers
> and it produced the expected output.
>
>
> dataset:
>
> vertex id, vertex value, neighbours....
>
> 0 0 1
> 1 1 0 2 3
> 2 2 1 3
> 3 3 1 2
>
> output:
> 1    0
> 0    0
> 3    0
> 2    0
>
>
>
> However when I tried to run this algorithm on a larger dataset
> (reformatted version of com-youtube.ungraph from Stanford snap to match the
> IntIntNullTextVertexInputFormat) it successfully complets but the incorrect
> output is produced. It seems to just output the vertex id with its orignal
> value (its vertex id is its original value that i set).
>
> A snippet of the dataset is provided:
>
> vertex id, vertex value, neighbours....
> .......
> 278447 278447 532613
> 278449 278449 305447 324115 414238
> 83899 83899 153460 172614 176613 211448
> 773749 773749 845366
> 773748 773748 960388
> .......
>
> output produced:
> .............
> 73132    73132
> 831308    831308
> 199788    199788
> 763644    763644
> 300572    300572
> .............
>
> there's not one vertex value that isn't the same as its original vertex
> ID.
>
> The computation also stops after superstep 0 is done and goes no further,
> whereas on my smaller data set completes 3 supersteps.
>
> Does anyone have an idea to why this is?
>
> Kind regards,
>
> Ghufran
>
>
>

Mime
View raw message