giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <>
Subject Re: writing/emitting to HDFS
Date Mon, 26 Sep 2011 14:03:38 GMT
i'm really just trying to emit "results" into an hdfs file at
different moments of the computation. I'm really just thinking at a
functionality like log.debug(), to give an example, where all the
messages are collected from different workers at different supersteps.
At the moment I've implemented this:

which i assign to each vertex at preApplication() and close from each
vertex at postApplication(). I'm not super happy about this solution.
During this weekend though, I thought I might use an Aggregator to
send my ResultSet object and use the Aggregator to write to disk. That
would be a nice design and I could contribute the JIRA about storing
Aggregator results.

What do you think?

On Fri, Sep 23, 2011 at 1:40 AM, Avery Ching <> wrote:
> This is more of a limitation of the fact that files are immutable in HDFS.
>  Any more insight on what you're trying to do?  Perhaps we can think of a
> more general way to address the issue.
> Avery
> On 9/22/11 10:31 AM, Claudio Martella wrote:
>> Hi Avery,
>> thanks, yes it does. The question would be though how to share the
>> file handle between the vertices on the same node. i could open the
>> file on the preApplication() and close it on the postApplication() but
>> i would end up potentially with as many files as vertices in the
>> graph.
>> Do you have any idea on this side? Maybe share somehow the handle and a
>> lock?
>> On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching<>  wrote:
>>> There are some methods in Vertex (i.e. preApplication(), preSuperstep(),
>>> postApplication(), postSuperstep()) that can be overidden to do anything
>>> you
>>> like, for instance write out some data to an HDFS file.  We have an open
>>> issue on outputting Aggregator values that is unassigned if you'd like to
>>> take a look at it as well
>>> (
>>> Hope this helps,
>>> Avery
>>> On 9/22/11 7:34 AM, Claudio Martella wrote:
>>>> Hello list,
>>>> I have the need to emit to HDFS once in a while some Text. This
>>>> doesn't happen necessarily at the end of the computation and I might
>>>> need to emit something more complex than just the VertexValue, so I'd
>>>> like more control than what the VertexWriter gives me.
>>>> What do you suggest I might do to obtain a handler to a HDFS file (it
>>>> can be in parts aswell) to write to?
>>>> Is there any code I can start looking at?
>>>> Thanks!
>>>> Claudio

    Claudio Martella

View raw message