hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: [jira] [Commented] (HAMA-580) Improve input of graph module
Date Thu, 24 May 2012 08:25:39 GMT
> However it might be a good thing to consider that giraph is supporting all
> inputformats and have a input key/value to vertex parser that runs when
> loading vertices.
> This would shift the responsibility to the user and we would remove
> Writability of the vertices, thus removing the VertexWritable classes.

+1

On Thu, May 24, 2012 at 4:30 PM, Thomas Jungblut
<thomas.jungblut@googlemail.com> wrote:
> Can't post to jira because it is down or has high latency.
>
> I dislike the idea as well, but it is the most optimal case to write the
> vertices.
> Consider the Wikipedia linkset, 1gb of text data as adjacency list.
> With current trunk version it has at most 10gb.
> I have no clear check of how it is with that patch, but I assume that it
> will be less than 1gb.
> Suppose you have 64mb chunksize in HDFS, meaning 160 bsp tasks to be
> launched, as opposed to 16 for the most optimal case.
> I don't know if that's an argument for you. Compatibility to MapReduce
> shouldn't be our first aim, we can make a BSP job out of the random graph
> generator.
> However it might be a good thing to consider that giraph is supporting all
> inputformats and have a input key/value to vertex parser that runs when
> loading vertices.
> This would shift the responsibility to the user and we would remove
> Writability of the vertices, thus removing the VertexWritable classes.
>
> If you have a good trade-off idea, let me know.
>
>
> 2012/5/24 Edward J. Yoon (JIRA) <jira@apache.org>
>
>>
>>    [
>> https://issues.apache.org/jira/browse/HAMA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282244#comment-13282244]
>>
>> Edward J. Yoon commented on HAMA-580:
>> -------------------------------------
>>
>> I dislike this idea. This makes programming complex and discourages use of
>> existing Mapper/Reducer e.g., Reducer, LongSumReducer, ...
>>
>> > Improve input of graph module
>> > -----------------------------
>> >
>> >                 Key: HAMA-580
>> >                 URL: https://issues.apache.org/jira/browse/HAMA-580
>> >             Project: Hama
>> >          Issue Type: Improvement
>> >          Components: graph
>> >    Affects Versions: 0.5.0
>> >            Reporter: Thomas Jungblut
>> >            Assignee: Thomas Jungblut
>> >             Fix For: 0.5.0
>> >
>> >         Attachments: HAMA-580.patch, HAMA-580_1.patch
>> >
>> >
>> > Currently it is too verbose, the wikipedia dataset is going to be
>> bloated from 0.95gb to 5gb just because it is writing the classes x-times.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators:
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>>
>>
>
>
> --
> Thomas Jungblut
> Berlin <thomas.jungblut@gmail.com>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message