giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Question about range partitioner and data locality
Date Fri, 25 May 2012 18:07:26 GMT
Writing a range based partitioner is for potentially reducing the number 
of messages between workers (i.e. reverse lexical ordering of urls for 
page rank).  Without changes to the input splits loading, the average 
number vertices shipped during the input superstep will be the same as 
the using the hash partitioner.  Is this what you are trying to achieve?

Avery

On 5/25/12 10:57 AM, Yuanyuan Tian wrote:
> I am not suggesting to change the current range partitioner, as it is 
> designed for a general case. I want to write a special partitioner 
> based on the existing range partitioner to achieve what I want to do 
> in this special situation, but I don't know how.
>
> Yuanyuan
>
> -----Avery Ching <aching@apache.org> wrote: -----
> To: user@giraph.apache.org
> From: Avery Ching <aching@apache.org>
> Date: 05/24/2012 11:59PM
> Subject: Re: Question about range partitioner and data locality
>
> You are definitely right that the old version of Giraph supported 
> ranges pretty well for loading, but could not support hash based 
> distribution (much better for memory distribution across workers).  It 
> also made a lot of assumptions (the data within each split was in a 
> unique range and sorted).
>
> Unless we make these type of assumptions, it would be pretty hard to 
> do.  One way might be to have all the workers examine each input 
> split, and each input split would provide on information as to its 
> range.  If the worker matches that range, it would attempt to load 
> some or all of the vertices in that split.  Otherwise, it would try 
> the next split.
>
> Any other ideas?
>
> Avery
>
> On 5/23/12 5:36 PM, Yuanyuan Tian wrote:
>> Hi,
>>
>> I want to use better partitions of input graph for my algorithm 
>> running on Giraph. So, I partitioned my input graph and re-labeled 
>> the vertex ids so that vertex ids of the same partition are in a 
>> consecutive range. I also reorganized the input file so that the 
>> vertices in the same range are together. I used the range partitioner 
>> for the Giraph job to utilize the better partitions. However, the 
>> vertex loader still looks for the partition id of each vertex and 
>> ship it to the worker that owns the partition. On the other hand, I 
>> have already prepared my data in a nice way, in the ideal case, I can 
>> just keep all the vertices of an inputsplit local to the 
>> corresponding worker. Is there an easy way to do this? I know that in 
>> the very old version of giraph, giraph doesn't have a partitioner. 
>> The users have to prepare the partitions. I essentially want to do a 
>> similar thing in the current version of giraph. Please give me a 
>> pointer or two on how to do this.
>>
>> Thanks,
>>
>> Yuanyuan 
>



Mime
View raw message