Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: giraph-user@incubator.apache.org
Message-ID: <4E875798.7060706@apache.org>
Date: Sat, 01 Oct 2011 11:10:32 -0700
From: Avery Ching <aching@apache.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6;
 rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: giraph-user@incubator.apache.org
CC: Jake Mannix <jake.mannix@gmail.com>
Subject: Re: On pre/post Application/Superstep contract
References: 
 <CAFJOoJdV4SyDhdQi4xq49orqLMdMvM7XuvCT9BvbWV0UFx+Lrg@mail.gmail.com>
 <CACYXym-C2LpfiVqK5yaVYZDtY=PzaqCFweyvXeKFFfZcBMu_uQ@mail.gmail.com>
 <CAFJOoJe+efGsFABjtMhh8vwBwF40vDyXoZt298ezsHfUkFyg=A@mail.gmail.com>
 <4E8603F4.6020803@apache.org>
 <CAM=XDd8TDVf7VKdxgbDnp4DgrV-80e-NSPMJGf_CNw-u1krG4w@mail.gmail.com>
 <CACYXym8aCVWW_0Ax_MonSVQ=xsgt43cSNUJcWbpC1duJJardxg@mail.gmail.com>
In-Reply-To: 
 <CACYXym8aCVWW_0Ax_MonSVQ=xsgt43cSNUJcWbpC1duJJardxg@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------060605000901090807050906"

This is a multi-part message in MIME format.
--------------060605000901090807050906
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Can you show me an example of the inner Context class idea?  Sounds 
interesting...

Another question is whether to have the 
(pre|post)(Application|Superstep)() methods executed one as an aggregate 
and passed to the workers, or executed per worker.  I think the former 
might be a little expensive, depending on how big the "Context" is.  
Perhaps executed per worker makes the most sense.  Any other thoughts?

Maybe aggregator methods would be useful as well, say to do this like 
write the aggregators for the entire application every now and then.  
That would probably get executed on the master.  I think the current 
uses of the (pre|post)(Application|Superstep)() methods are fine in the 
per-worker specific way of thinking.

Avery

On 10/1/11 7:06 AM, Jake Mannix wrote:
> On Sat, Oct 1, 2011 at 2:29 AM, Hyunsik Choi <hyunsik@apache.org 
> <mailto:hyunsik@apache.org>> wrote:
>
>     Now, that way looks good. Probably, later we could improve that
>     like Context
>     of MapReduce.
>
>
> ooooooh!  I really that suggestion, actually.  If every BasicVertex has an
> inner Context class, we can allow user applications to define/extend their
> Context and we can avoid even doing any of this setClass() and reflection
> based stuff, if we do it right.  Typesafe context object FTW!
>
>   -jake
>
>
>     --
>     Hyunsik Choi
>     Database Lab, Korea University
>
>     On Sat, Oct 1, 2011 at 3:01 AM, Avery Ching <aching@apache.org
>     <mailto:aching@apache.org>> wrote:
>     > It isn't visible (purposefully) since it is internal state.
>     >
>     > That being said, I believe this type of functionality would be
>     useful.
>     >  Right now there is a lot of ugly static variables stored in Vertex
>     > implementations because of it.  Perhaps we should add another
>     method in
>     > GiraphJob
>     >
>     > final public void setWorkerObjectClass(Class<? extends Configurable>
>     > workerObjectClass);
>     >
>     > Then in BasicVertex
>     >
>     > public void preApplication(Configurable workerObject);
>     > public void postApplication(Configurable workerObject);
>     > public void preSuperstep(Configurable workerObject);
>     > public void postSuperstep(Configurable workerObject);
>     > public Configurable getWorkerObject();
>     >
>     > Anyone else think of a cleaner way to do it?
>     >
>     > Avery
>     >
>     > On 9/30/11 8:42 AM, Claudio Martella wrote:
>     >>
>     >> afaik getGraphState() is not visible to my object. Or?
>     >>
>     >> On Fri, Sep 30, 2011 at 5:23 PM, Jake
>     Mannix<jake.mannix@gmail.com <mailto:jake.mannix@gmail.com>>
>     >>  wrote:
>     >>>
>     >>> Remember that there's already a "singleton"-like object
>     available to all
>     >>> vertices: the GraphState object, which has a handle on the
>     GraphMapper.
>     >>> Maybe this is the right place to get your handle on the
>     >>> FSDataOutputStream?
>     >>>   -jake
>     >>> On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella
>     >>> <claudio.martella@gmail.com
>     <mailto:claudio.martella@gmail.com>>  wrote:
>     >>>>
>     >>>> Hello,
>     >>>>
>     >>>> to my understanding pre/post Application/Superstep methods
>     are called
>     >>>> ONCE on a "fake" vertex on each worker (the so called
>     >>>> representativeVertex). This means that these methods should
>     not depend
>     >>>> on any specific-vertex data.
>     >>>>
>     >>>> As I'm trying to sort out my Emitter, I thought I could
>     create one
>     >>>> FSDataOutputStream per worker which each Vertex belonging to that
>     >>>> worker could share (which would be even thread-safe as each
>     worker is
>     >>>> not parallel).
>     >>>>
>     >>>> The questions are:
>     >>>>
>     >>>> 1) how to share the FSDataOutputFormat object created at
>     >>>> preApplication() (and closed at postApplication()) created by
>     this
>     >>>> representativeVertex?
>     >>>>
>     >>>> 2) about the filename, I'd be happy to have access to the
>     Worker Id so
>     >>>> to create an outputfile filename as with happens with
>     reducers and
>     >>>> part files by FileOutputFormat
>     (i.e.<userdefinedfilename>-workerid).
>     >>>>
>     >>>>
>     >>>> The "best" idea i have in my mind right now is to use the calling
>     >>>> vertex (the representativeVertex) hashCode as the id, and
>     create an
>     >>>> external Singleton where i can request register and request the
>     >>>> outputfiles similarly to what happens with Aggregators now,
>     and by
>     >>>> passing the *this* reference as an index to this map. Any
>     better idea?
>     >>>> :)
>     >>>>
>     >>>>
>     >>>> --
>     >>>>     Claudio Martella
>     >>>> claudio.martella@gmail.com <mailto:claudio.martella@gmail.com>
>     >>>
>     >>
>     >>
>     >
>     >
>
>


--------------060605000901090807050906
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Can you show me an example of the inner Context class idea?&nbsp; Sounds
    interesting...<br>
    <br>
    Another question is whether to have the
    (pre|post)(Application|Superstep)() methods executed one as an
    aggregate and passed to the workers, or executed per worker.&nbsp; I
    think the former might be a little expensive, depending on how big
    the "Context" is.&nbsp; Perhaps executed per worker makes the most
    sense.&nbsp; Any other thoughts?<br>
    <br>
    Maybe aggregator methods would be useful as well, say to do this
    like write the aggregators for the entire application every now and
    then.&nbsp; That would probably get executed on the master.&nbsp; I think the
    current uses of the (pre|post)(Application|Superstep)() methods are
    fine in the per-worker specific way of thinking. <br>
    <br>
    Avery<br>
    <br>
    On 10/1/11 7:06 AM, Jake Mannix wrote:
    <blockquote
cite="mid:CACYXym8aCVWW_0Ax_MonSVQ=xsgt43cSNUJcWbpC1duJJardxg@mail.gmail.com"
      type="cite">On Sat, Oct 1, 2011 at 2:29 AM, Hyunsik Choi <span
        dir="ltr">&lt;<a moz-do-not-send="true"
          href="mailto:hyunsik@apache.org">hyunsik@apache.org</a>&gt;</span>
      wrote:<br>
      <div class="gmail_quote">
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex;">
          Now, that way looks good. Probably, later we could improve
          that like Context<br>
          of MapReduce.<br>
        </blockquote>
        <div><br>
        </div>
        <div>ooooooh! &nbsp;I really that suggestion, actually. &nbsp;If every
          BasicVertex has an</div>
        <div>inner Context class, we can allow user applications to
          define/extend their</div>
        <div>
          Context and we can avoid even doing any of this setClass() and
          reflection</div>
        <div>based stuff, if we do it right. &nbsp;Typesafe context object
          FTW!</div>
        <div><br>
        </div>
        <div>&nbsp; -jake&nbsp;</div>
        <div>&nbsp;</div>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex;">
          <font color="#888888"><br>
            --<br>
            Hyunsik Choi<br>
            Database Lab, Korea University<br>
          </font>
          <div>
            <div class="h5"><br>
              On Sat, Oct 1, 2011 at 3:01 AM, Avery Ching &lt;<a
                moz-do-not-send="true" href="mailto:aching@apache.org">aching@apache.org</a>&gt;
              wrote:<br>
              &gt; It isn't visible (purposefully) since it is internal
              state.<br>
              &gt;<br>
              &gt; That being said, I believe this type of functionality
              would be useful.<br>
              &gt; &nbsp;Right now there is a lot of ugly static variables
              stored in Vertex<br>
              &gt; implementations because of it. &nbsp;Perhaps we should add
              another method in<br>
              &gt; GiraphJob<br>
              &gt;<br>
              &gt; final public void setWorkerObjectClass(Class&lt;?
              extends Configurable&gt;<br>
              &gt; workerObjectClass);<br>
              &gt;<br>
              &gt; Then in BasicVertex<br>
              &gt;<br>
              &gt; public void preApplication(Configurable
              workerObject);<br>
              &gt; public void postApplication(Configurable
              workerObject);<br>
              &gt; public void preSuperstep(Configurable workerObject);<br>
              &gt; public void postSuperstep(Configurable workerObject);<br>
              &gt; public Configurable getWorkerObject();<br>
              &gt;<br>
              &gt; Anyone else think of a cleaner way to do it?<br>
              &gt;<br>
              &gt; Avery<br>
              &gt;<br>
              &gt; On 9/30/11 8:42 AM, Claudio Martella wrote:<br>
              &gt;&gt;<br>
              &gt;&gt; afaik getGraphState() is not visible to my
              object. Or?<br>
              &gt;&gt;<br>
              &gt;&gt; On Fri, Sep 30, 2011 at 5:23 PM, Jake Mannix&lt;<a
                moz-do-not-send="true"
                href="mailto:jake.mannix@gmail.com">jake.mannix@gmail.com</a>&gt;<br>
              &gt;&gt; &nbsp;wrote:<br>
              &gt;&gt;&gt;<br>
              &gt;&gt;&gt; Remember that there's already a
              "singleton"-like object available to all<br>
              &gt;&gt;&gt; vertices: the GraphState object, which has a
              handle on the GraphMapper.<br>
              &gt;&gt;&gt; Maybe this is the right place to get your
              handle on the<br>
              &gt;&gt;&gt; FSDataOutputStream?<br>
              &gt;&gt;&gt; &nbsp; -jake<br>
              &gt;&gt;&gt; On Fri, Sep 30, 2011 at 7:25 AM, Claudio
              Martella<br>
              &gt;&gt;&gt; &lt;<a moz-do-not-send="true"
                href="mailto:claudio.martella@gmail.com">claudio.martella@gmail.com</a>&gt;
              &nbsp;wrote:<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt; Hello,<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt; to my understanding pre/post
              Application/Superstep methods are called<br>
              &gt;&gt;&gt;&gt; ONCE on a "fake" vertex on each worker
              (the so called<br>
              &gt;&gt;&gt;&gt; representativeVertex). This means that
              these methods should not depend<br>
              &gt;&gt;&gt;&gt; on any specific-vertex data.<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt; As I'm trying to sort out my Emitter, I
              thought I could create one<br>
              &gt;&gt;&gt;&gt; FSDataOutputStream per worker which each
              Vertex belonging to that<br>
              &gt;&gt;&gt;&gt; worker could share (which would be even
              thread-safe as each worker is<br>
              &gt;&gt;&gt;&gt; not parallel).<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt; The questions are:<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt; 1) how to share the FSDataOutputFormat
              object created at<br>
              &gt;&gt;&gt;&gt; preApplication() (and closed at
              postApplication()) created by this<br>
              &gt;&gt;&gt;&gt; representativeVertex?<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt; 2) about the filename, I'd be happy to
              have access to the Worker Id so<br>
              &gt;&gt;&gt;&gt; to create an outputfile filename as with
              happens with reducers and<br>
              &gt;&gt;&gt;&gt; part files by FileOutputFormat
              (i.e.&lt;userdefinedfilename&gt;-workerid).<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt; The "best" idea i have in my mind right
              now is to use the calling<br>
              &gt;&gt;&gt;&gt; vertex (the representativeVertex)
              hashCode as the id, and create an<br>
              &gt;&gt;&gt;&gt; external Singleton where i can request
              register and request the<br>
              &gt;&gt;&gt;&gt; outputfiles similarly to what happens
              with Aggregators now, and by<br>
              &gt;&gt;&gt;&gt; passing the *this* reference as an index
              to this map. Any better idea?<br>
              &gt;&gt;&gt;&gt; :)<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt;<br>
              &gt;&gt;&gt;&gt; --<br>
              &gt;&gt;&gt;&gt; &nbsp; &nbsp; Claudio Martella<br>
              &gt;&gt;&gt;&gt; &nbsp; &nbsp; <a moz-do-not-send="true"
                href="mailto:claudio.martella@gmail.com">claudio.martella@gmail.com</a><br>
              &gt;&gt;&gt;<br>
              &gt;&gt;<br>
              &gt;&gt;<br>
              &gt;<br>
              &gt;<br>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </body>
</html>

--------------060605000901090807050906--