hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-743) GSoC 2013, Accumulo/HBase's webtable and Hama's PageRank
Date Mon, 11 Mar 2013 13:33:12 GMT

    [ https://issues.apache.org/jira/browse/HAMA-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598809#comment-13598809
] 

Edward J. Yoon commented on HAMA-743:
-------------------------------------

First of all, you'll need to understand the BigTable's data model. Each cell is stored in
3D (row,column,timestamp) cube space. Rows are in alphabetical order.

If you feel ready, try to create webtable described in Google's BigTable (Row key is URL and
Column families are anchor, contents, charset, .., etc). Please ignore timestamp dimension
to avoid complexity. Then, you'll realized that (Row and 'Anchor' column family = inlink by
outlink sparse matrix).

The next step is a PageRank calculation. Read Google's Pregel paper and see Hama implementation.

References:

 - http://svn.apache.org/repos/asf/accumulo/contrib/bsp/trunk/src/main/java/org/apache/accumulo/bsp/
 - http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/PageRank.java
                
> GSoC 2013, Accumulo/HBase's webtable and Hama's PageRank
> --------------------------------------------------------
>
>                 Key: HAMA-743
>                 URL: https://issues.apache.org/jira/browse/HAMA-743
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Edward J. Yoon
>              Labels: gsoc, gsoc2013, mentor
>
> You'll learns and experiments about the Google's bigtable and pregel by using Apache
Accumulo and Hama.
> Implementation issues are inputformatter and partitioner for extracting the 2D matrix
from the webtable and partitioning splits by key range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message