giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuanyuan Tian <yt...@us.ibm.com>
Subject Re: Question about range partitioner and data locality
Date Fri, 25 May 2012 17:57:29 GMT
<font face="Default Sans Serif,Verdana,Arial,Helvetica,sans-serif" size="2"> <span>I
am not suggesting to change the current range partitioner, as it is designed for a general
case. I want to write a special partitioner based on the existing range partitioner to achieve
what I want to do in this special situation, but I don't know how. <br><br>Yuanyuan
<br><br></span><font color="#990099">-----Avery Ching &lt;aching@apache.org&gt;
wrote: -----</font><div style="padding-left:5px;"><div style="padding-right:0px;padding-left:5px;border-left:solid
black 2px;">To: user@giraph.apache.org<br>From: Avery Ching &lt;aching@apache.org&gt;<br>Date:
05/24/2012 11:59PM<br>Subject: Re: Question about range partitioner and data locality<br><br>
 
    <!--Notes ACF
<meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">-->
 
  
    You are definitely right that the old version of Giraph supported
    ranges pretty
well for loading, but could not support hash based
    distribution (much better for memory
distribution across workers).&nbsp;
    It also made a lot of assumptions (the data within
each split was in
    a unique range and sorted).<br>
    <br>
    Unless we make
these type of assumptions, it would be pretty hard to
    do.&nbsp; One way might be to
have all the workers examine each input
    split, and each input split would provide on information
as to its
    range.&nbsp; If the worker matches that range, it would attempt to load
   some or all of the vertices in that split.&nbsp; Otherwise, it would try
    the next
split.<br>
    <br>
    Any other ideas?<br>
    <br>
    Avery<br>
   <br>
    On 5/23/12 5:36 PM, Yuanyuan Tian wrote:
    <blockquote cite="mid:OF8290E988.C86A3F69-ON85257A08.00010132-88257A08.00034FB5@us.ibm.com"
type="cite"><font face="sans-serif" size="2">Hi,</font>
      <br>
 
    <br>
      <font face="sans-serif" size="2">I want to use better partitions
       of input
        graph for my algorithm running on Giraph. So, I partitioned my
  
     input graph
        and re-labeled the vertex ids so that vertex ids of the same
   
    partition
        are in a consecutive range. I also reorganized the input file so
  
     that the
        vertices in the same range are together. I used the range
        partitioner
for
        the Giraph job to utilize the better partitions. However, the
        vertex loader
       still looks for the partition id of each vertex and ship it to
        the worker
       that owns the partition. On the other hand, I have already
        prepared my
   
    data in a nice way, in the ideal case, I can just keep all the
        vertices
     
  of an inputsplit local to the corresponding worker. Is there an
        easy way
      
 to do this? I know that in the very old version of giraph,
        giraph doesn't
      
 have a partitioner. The users have to prepare the partitions. I
        essentially
    
   want to do a similar thing in the current version of giraph.
        Please give
     
  me a pointer or two on how to do this.</font>
      <br>
      <br>
 
    <font face="sans-serif" size="2">Thanks,</font>
      <br>
      <br>
     <font face="sans-serif" size="2">Yuanyuan</font>
    </blockquote>
   <br>
  

</div></div><div></div></font>

Mime
View raw message