giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-524) Giraph can receive input from vertex or edge-centric data sets; its output is graph data, not "vertices"
Date Thu, 21 Feb 2013 19:54:12 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583491#comment-13583491
] 

Eli Reisman commented on GIRAPH-524:
------------------------------------

Good discussion, and good points. I guess that essentially our output formats can (and do)
output the whole resulting graph state at the end of a job, and can choose to output vertex
values, edge weights, simply the graph itself, or any combo of the above. 

Its a great point that regardless of the data this is output at the granularity of "one vertex
worth of graph data per record" but on that same line of thinking, our output inevitably ends
up as semi-structured input data for another MR job. In this situation, we tell MR what it
thinks its seeing per-record and what to do with it. So its really just MR input data, same
as for any other semi-structured input to a Pig job or whatever.

The Giraph job has produced useful output data, and it doesn't matter to the next user of
that data whether the values in each record are edge weights or vertex values, its just processing
the data in each record for its own workflow, it just needs to know that each record contains
the values that make it past sanity checks and are formatted right for extracting into data
structures. The way the data was produced in Giraph only mattered to Giraph.

Does this make any sense, or in fact in use do you find that some assumptions of the "graph
structure" of the data as it was produced is expected or utilized in say a Hive job after
the fact, on that same output data? I guess if the data is going right back into another Giraph
job this would be the case.

                
> Giraph can receive input from vertex or edge-centric data sets; its output is graph data,
not "vertices"
> --------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-524
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-524
>             Project: Giraph
>          Issue Type: Bug
>          Components: graph
>            Reporter: Eli Reisman
>            Priority: Minor
>             Fix For: 0.2.0
>
>
> It is silly to have any of our Output format names tied to the "vertex" when in fact
we are just outputting graph data. The output format names should reflect the formatting of
the output, and perhaps which elements of the graph data you want in the output.
> Lets change those names? Then they get shorter too as a bonus.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message