hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HAMA-493) Provide text to seq-file utils for graph examples
Date Fri, 27 Jan 2012 17:26:10 GMT

     [ https://issues.apache.org/jira/browse/HAMA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Jungblut updated HAMA-493:
---------------------------------

    Attachment: HAMA-493.patch


Here is the example usage: 

{noformat}
~$ /usr/local/hama/bin/hama jar /usr/local/hama/hama-examples-0.4.0-incubating-SNAPSHOT.jar
pagerank-text2seq /tmp/test_seq/in.txt hdfs://localhost:9000/tmp/test_seq/out.seq
12/01/27 17:33:55 INFO util.TextToSequenceFile: Processing file : file:/tmp/test_seq/in.txt
12/01/27 17:33:55 INFO util.TextToSequenceFile: Written 246 to hdfs://localhost:9000/tmp/test_seq/out.seq/in.txt.seq
{noformat}

Then you can run pagerank on it:

{noformat}
~$ /usr/local/hama/bin/hama jar /usr/local/hama/hama-examples-0.4.0-incubating-SNAPSHOT.jar
pagerank /tmp/test_seq/out.seq/ /tmp/test_seq/out/
{noformat}

Similar it is working with SSSP.
In both, you can customize a separator string that is delimiting the records.
Play arround a bit with it. It also allows people to use regex'es in their paths and is able
to transform multiple text files into sequencefiles.

BTW, we should delete the partition in the input directory once it has run, otherwise the
user gets "Not a file" errors when rerunning the job.
Didn't we have a cleanup issue for that?
I added a remove part to the partition-dir in the FileInputFormat. Please review this, and
say if you feel okay with this solution.

And just another thing, once a task has thrown an exception, we should kill the whole job.
It is just hanging to infinity because the task doesn't report back to the groom?

However I should add testcases for it this patch. And document the public methods.
                
> Provide text to seq-file utils for graph examples
> -------------------------------------------------
>
>                 Key: HAMA-493
>                 URL: https://issues.apache.org/jira/browse/HAMA-493
>             Project: Hama
>          Issue Type: New Feature
>    Affects Versions: 0.3.0
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>             Fix For: 0.4.0
>
>         Attachments: HAMA-493.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message