giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: writing/emitting to HDFS
Date Tue, 27 Sep 2011 05:23:18 GMT
Hi Claudio,

I think I understand what you are trying to do, a kind of a distributed 
logging for debugging.  I think such a feature can definitely be 
useful.  Aggregators might be able to do what you want, then with things 
like https://issues.apache.org/jira/browse/GIRAPH-10, perhaps not just 
at the end of the application, but after each superstep, might be able 
to accomplish what you want.

Feel free to take a crack at the issue...let's see what interfaces make 
sense.

Avery

On 9/26/11 7:03 AM, Claudio Martella wrote:
> i'm really just trying to emit "results" into an hdfs file at
> different moments of the computation. I'm really just thinking at a
> functionality like log.debug(), to give an example, where all the
> messages are collected from different workers at different supersteps.
> At the moment I've implemented this:
>
> https://github.com/claudiomartella/graffiti/blob/master/src/main/java/org/acaro/graffiti/processing/GraffitiEmitter.java
>
> which i assign to each vertex at preApplication() and close from each
> vertex at postApplication(). I'm not super happy about this solution.
> During this weekend though, I thought I might use an Aggregator to
> send my ResultSet object and use the Aggregator to write to disk. That
> would be a nice design and I could contribute the JIRA about storing
> Aggregator results.
>
> What do you think?
>
> On Fri, Sep 23, 2011 at 1:40 AM, Avery Ching<avery.ching@gmail.com>  wrote:
>> This is more of a limitation of the fact that files are immutable in HDFS.
>>   Any more insight on what you're trying to do?  Perhaps we can think of a
>> more general way to address the issue.
>>
>> Avery
>>
>> On 9/22/11 10:31 AM, Claudio Martella wrote:
>>> Hi Avery,
>>>
>>> thanks, yes it does. The question would be though how to share the
>>> file handle between the vertices on the same node. i could open the
>>> file on the preApplication() and close it on the postApplication() but
>>> i would end up potentially with as many files as vertices in the
>>> graph.
>>>
>>> Do you have any idea on this side? Maybe share somehow the handle and a
>>> lock?
>>>
>>> On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching<aching@apache.org>    wrote:
>>>> There are some methods in Vertex (i.e. preApplication(), preSuperstep(),
>>>> postApplication(), postSuperstep()) that can be overidden to do anything
>>>> you
>>>> like, for instance write out some data to an HDFS file.  We have an open
>>>> issue on outputting Aggregator values that is unassigned if you'd like to
>>>> take a look at it as well
>>>> (https://issues.apache.org/jira/browse/GIRAPH-10).
>>>>
>>>> Hope this helps,
>>>>
>>>> Avery
>>>>
>>>> On 9/22/11 7:34 AM, Claudio Martella wrote:
>>>>> Hello list,
>>>>>
>>>>> I have the need to emit to HDFS once in a while some Text. This
>>>>> doesn't happen necessarily at the end of the computation and I might
>>>>> need to emit something more complex than just the VertexValue, so I'd
>>>>> like more control than what the VertexWriter gives me.
>>>>>
>>>>> What do you suggest I might do to obtain a handler to a HDFS file (it
>>>>> can be in parts aswell) to write to?
>>>>> Is there any code I can start looking at?
>>>>>
>>>>> Thanks!
>>>>> Claudio
>>>>>
>>>
>>
>
>


Mime
View raw message