giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <>
Subject Re: writing/emitting to HDFS
Date Tue, 27 Sep 2011 05:23:18 GMT
Hi Claudio,

I think I understand what you are trying to do, a kind of a distributed 
logging for debugging.  I think such a feature can definitely be 
useful.  Aggregators might be able to do what you want, then with things 
like, perhaps not just 
at the end of the application, but after each superstep, might be able 
to accomplish what you want.

Feel free to take a crack at the issue...let's see what interfaces make 


On 9/26/11 7:03 AM, Claudio Martella wrote:
> i'm really just trying to emit "results" into an hdfs file at
> different moments of the computation. I'm really just thinking at a
> functionality like log.debug(), to give an example, where all the
> messages are collected from different workers at different supersteps.
> At the moment I've implemented this:
> which i assign to each vertex at preApplication() and close from each
> vertex at postApplication(). I'm not super happy about this solution.
> During this weekend though, I thought I might use an Aggregator to
> send my ResultSet object and use the Aggregator to write to disk. That
> would be a nice design and I could contribute the JIRA about storing
> Aggregator results.
> What do you think?
> On Fri, Sep 23, 2011 at 1:40 AM, Avery Ching<>  wrote:
>> This is more of a limitation of the fact that files are immutable in HDFS.
>>   Any more insight on what you're trying to do?  Perhaps we can think of a
>> more general way to address the issue.
>> Avery
>> On 9/22/11 10:31 AM, Claudio Martella wrote:
>>> Hi Avery,
>>> thanks, yes it does. The question would be though how to share the
>>> file handle between the vertices on the same node. i could open the
>>> file on the preApplication() and close it on the postApplication() but
>>> i would end up potentially with as many files as vertices in the
>>> graph.
>>> Do you have any idea on this side? Maybe share somehow the handle and a
>>> lock?
>>> On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching<>    wrote:
>>>> There are some methods in Vertex (i.e. preApplication(), preSuperstep(),
>>>> postApplication(), postSuperstep()) that can be overidden to do anything
>>>> you
>>>> like, for instance write out some data to an HDFS file.  We have an open
>>>> issue on outputting Aggregator values that is unassigned if you'd like to
>>>> take a look at it as well
>>>> (
>>>> Hope this helps,
>>>> Avery
>>>> On 9/22/11 7:34 AM, Claudio Martella wrote:
>>>>> Hello list,
>>>>> I have the need to emit to HDFS once in a while some Text. This
>>>>> doesn't happen necessarily at the end of the computation and I might
>>>>> need to emit something more complex than just the VertexValue, so I'd
>>>>> like more control than what the VertexWriter gives me.
>>>>> What do you suggest I might do to obtain a handler to a HDFS file (it
>>>>> can be in parts aswell) to write to?
>>>>> Is there any code I can start looking at?
>>>>> Thanks!
>>>>> Claudio

View raw message