incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-395) Example: PageRank
Date Mon, 13 Jun 2011 18:54:52 GMT

    [ https://issues.apache.org/jira/browse/HAMA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048708#comment-13048708
] 

Thomas Jungblut commented on HAMA-395:
--------------------------------------

great :)

Currently crawling arround 105000 sites with their outlinks. Tomorrow I'm going to reduce
the dataset to an adjacency list and write a bsp parser for that.
I've decided to use Text.class as key and value, value is a semicolon seperated list of hosts.
Each element is representing a normalized host like stackoverflow.com or google.com.
So a key is the site and the value is a seperated list of outlinks.

Do you need any other input formatting Steve? 
As far as I can see it is parsing a textfile with the default delimiter of StringTokenizer
where the first element of a line is the page and the follow up elements are the outlinks.

> Example: PageRank
> -----------------
>
>                 Key: HAMA-395
>                 URL: https://issues.apache.org/jira/browse/HAMA-395
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp, examples
>    Affects Versions: 0.2.0
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>         Attachments: HAMA-395-v1.patch, HAMA-395-v2.patch, HAMA-395-v3.patch, HAMA-395.patch
>
>
> I'd like to contribute my PageRank BSP as an example. 
> http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html
> TODO:
> - refactor the partitioning from the SSSP patch in https://issues.apache.org/jira/browse/HAMA-359
(extract an utility class etc)
> - add a really cool web-sub-graph example dataset ;D
> - add a wiki page for it

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message