giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ghufran malik <ghufran1ma...@gmail.com>
Subject Re: Running ConnectedComponents in a cluster.
Date Thu, 17 Apr 2014 15:21:40 GMT
oh woops! yes i meant i change it to an undirected format!


On Thu, Apr 17, 2014 at 4:11 PM, ghufran malik <ghufran1malik@gmail.com>wrote:

> Hi Jae,
>
> Thanks so much for pointing out that it wasn't directed. I made the
> changes and made a directed graph and connected components now works :)
>
> Thanks,
> Ghufran
>
>
> On Wed, Apr 16, 2014 at 7:31 PM, Yu, Jaewook <jaewook.yu@intel.com> wrote:
>
>>  Ghufran,
>>
>>
>>
>> The Youtube community dataset (com-youtube.ungraph.txt.gz<https://snap.stanford.edu/data/bigdata/communities/com-youtube.ungraph.txt.gz>)
>> [1] is formatted as directed graph although the description says it’s
>> undirected graph. With some minor changes in your conversion program, you
>> should be able to generated a proper undirected adjacency list.
>>
>>
>>
>> Hope this will help.
>>
>>
>>
>> Thanks,
>>
>> Jae
>>
>>
>>
>> [1] https://snap.stanford.edu/data/com-Youtube.html
>>
>>
>>
>> *From:* Yu, Jaewook [mailto:jaewook.yu@intel.com]
>> *Sent:* Wednesday, April 16, 2014 11:00 AM
>> *To:* user@giraph.apache.org
>> *Subject:* RE: Running ConnectedComponents in a cluster.
>>
>>
>>
>> Hi Ghufran,
>>
>>
>>
>> Have you verified the neighbors of each vertex actually exist? From your
>> adjacency list, for example, 278447 278447 532613, is the neighbor’s vertex
>> id 532613 valid?
>>
>>
>>
>> Thanks,
>>
>> Jae
>>
>>
>>
>>
>>
>> *From:* ghufran malik [mailto:ghufran1malik@gmail.com<ghufran1malik@gmail.com>]
>>
>> *Sent:* Wednesday, April 16, 2014 9:22 AM
>> *To:* user@giraph.apache.org
>> *Subject:* Running ConnectedComponents in a cluster.
>>
>>
>>
>> Hi,
>>
>> I have setup Giraph on my university cluster of computers (Giraph
>> 1.1.0-SNAPSHOT-for-hadoop-2.0.0-cdh4.3.1). I've successfully ran the
>> connected components algorithm on a very small test dataset using 4 workers
>> and it produced the expected output.
>>
>>
>> dataset:
>>
>> vertex id, vertex value, neighbours....
>>
>> 0 0 1
>> 1 1 0 2 3
>> 2 2 1 3
>> 3 3 1 2
>>
>> output:
>> 1    0
>> 0    0
>> 3    0
>> 2    0
>>
>>
>>
>> However when I tried to run this algorithm on a larger dataset
>> (reformatted version of com-youtube.ungraph from Stanford snap to match the
>> IntIntNullTextVertexInputFormat) it successfully complets but the incorrect
>> output is produced. It seems to just output the vertex id with its orignal
>> value (its vertex id is its original value that i set).
>>
>> A snippet of the dataset is provided:
>>
>> vertex id, vertex value, neighbours....
>> .......
>> 278447 278447 532613
>> 278449 278449 305447 324115 414238
>> 83899 83899 153460 172614 176613 211448
>> 773749 773749 845366
>> 773748 773748 960388
>> .......
>>
>> output produced:
>> .............
>> 73132    73132
>> 831308    831308
>> 199788    199788
>> 763644    763644
>> 300572    300572
>> .............
>>
>> there's not one vertex value that isn't the same as its original vertex
>> ID.
>>
>> The computation also stops after superstep 0 is done and goes no further,
>> whereas on my smaller data set completes 3 supersteps.
>>
>> Does anyone have an idea to why this is?
>>
>> Kind regards,
>>
>> Ghufran
>>
>>
>>
>
>

Mime
View raw message