giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lukas Nalezenec (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-1000) Multi Output support
Date Wed, 25 Mar 2015 11:22:52 GMT


Lukas Nalezenec commented on GIRAPH-1000:

I have never used Hadoop MultipleOutputs - I evaluated it when it was new but it was hard
to unit test. We have decided to replace it in MapReduce by our own internal implementation.
In my humble opinion MultipleOutputs are badly designed. Just my two cents.

I think there is not much documentation on Giraph internals. You have to read source code.
The code is well written and you will learn a lot. I don know much about these parts of Giraph
but if I will know i will help you.

> Multi Output support
> --------------------
>                 Key: GIRAPH-1000
>                 URL:
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp, conf and scripts, graph
>    Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT
>            Reporter: Alessio Arleo
>              Labels: features
> Hadoop natively supports multiple outputs. The objective is to extend Giraph to support
multiple output formats during a single giraph run.
> According to the official Hadoop apidocs*, to take advantage of multiple outputs the
 the pattern is the following:
> - Modify the job submission
> - Modify the reducer class to write on the declared different outputs
> Since Giraph jobs are executed as mappers, probably this approach (or at least its second
part) is not feasible, so further investigation is necessary.
> *

This message was sent by Atlassian JIRA

View raw message