hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Partitioner in Hama
Date Wed, 09 Jan 2013 08:37:09 GMT
Sorry, I was confused in term. ;)

On Wed, Jan 9, 2013 at 3:00 PM, Edward J. Yoon <edwardyoon@apache.org> wrote:
> Let's don't use the term "runtime partitioning" at this time.
>
> Originally,
>
>  * Partitioning was handled by single client-side 'BSPJobClient'.
>  * And, there were separate partition processing logic in
> GraphJobRunner, called run-time partitioning.
>
> And now, by using BSP job for partitioning input-data, we can process
> read and write operations in parallel. Also, data locality is
> preserved at least for read operations. Above all things, we can
> specify the number of BSP tasks now.
>
> If we want to implement network-based run-time partitioning, it should
> be processed before BSP's setup() method internally. I think we can
> hold the run-time partitioning for later on.
>
> On Wed, Jan 9, 2013 at 8:56 AM, Suraj Menon <surajsmenon@apache.org> wrote:
>>> Keeping run-time (network-based) partitioning within GraphJobRunner is
>>> not good idea.
>>
>>
>> It is not. I think I got testSubmitGraph to runtime partition (in
>> preprocessing step) the single file into 2 files in the unit tests in my
>> current state of patch..
>>
>>
>>> >> - the number of splits found are not equal to the number of BSP tasks
>>> >> configured for the job. OR
>>>
>>> I have a question. If the input is unsorted map and I want to
>>> re-partition by hashing but the numbers of blocks and desired tasks
>>> are same, then what happens? Do you mean run-time partitioning?
>>
>> You will have runtime partitioner class defined and partitioning flag on by
>> default. For case of HAMA-561 a user can switch off partitioning using the
>> same flag.
>>
>>
>>
>>> On Wed, Jan 9, 2013 at 7:07 AM, Suraj Menon <surajsmenon@apache.org>
>>> wrote:
>>> > Hi Apurv, yes, those are pending test cases to be fixed. GraphJobRunner
>>> is
>>> > expecting the input in the format of Vertex, but we have input files as
>>> > well as record key, values defined as Text. I have fixed only one unit
>>> test
>>> > case yet.
>>> >
>>> > On Tue, Jan 8, 2013 at 4:45 PM, Apurv Verma <dapurv5@gmail.com> wrote:
>>> >
>>> >> Hey all,
>>> >>  I got the problem, the partitioner was not being set for the
>>> >> PartitionerRunner bsp task. :P I have fixed the partitioner with
>>> portions
>>> >> from your patch Suraj. Now after this commit partitioner will obey what
>>> you
>>> >> specified earlier, just to recapitulate.
>>> >>
>>> >> Repartitioning is done if :
>>> >> - the number of splits found are not equal to the number of BSP tasks
>>> >> configured for the job. OR
>>> >> - the flag is set to true by the user
>>> ("bsp.input.runtime.partitioning") OR
>>> >> - user has specified a Runtime Partitioner class and enabled runtime
>>> >> partitioning
>>> >>
>>> >> There was one special thing that I discovered about partitioner , just
>>> >> sharing with you guys. Suppose I implement a partitioner which returns
0
>>> >> for a record, then it isn't necessary that this record will go to peer
>>> with
>>> >> index 0. It might go to peer 1. The only certitude which partitioner's
>>> >> provide is that all records returning 0 will go to the same peer. I
>>> needed
>>> >> partitioner to work for PrefixSum I was implementing.
>>> >>
>>> >> Things to do next.
>>> >> 1) RecordConverter , which Suraj is implementing in HAMA-700. (Please
>>> >> update Suraj)
>>> >>
>>> >> B.T.W there are problems in mvn test.
>>> >> *java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast
>>> to
>>> >> org.apache.hadoop.io.ArrayWritable*
>>> >> * at
>>> >>
>>> org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:287)*
>>> >> *
>>> >> *
>>> >> I don't think my commit is breaking this.
>>> >>
>>> >> Thanks
>>> >>
>>> >>
>>> >> --
>>> >> Regards,
>>> >> Apurv Verma
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Jan 8, 2013 at 11:07 PM, Suraj Menon <surajsmenon@apache.org>
>>> >> wrote:
>>> >>
>>> >> > Please explain the nature of problems you are facing with Partitioner?
>>> >> >
>>> >> > >Any reasons for deciding to move the
>>> >> > > PartitioningJob inside BSPJobClient from BSPJob?
>>> >> >
>>> >> > Twofold, BSPJob was just a configuration holder object, didn't
want to
>>> >> add
>>> >> > the partitioning responsibility to the class.
>>> >> > And also I wanted to know the number of splits, before taking the
>>> >> decision
>>> >> > whether to repartition or not.
>>> >> > Repartitioning is done if :
>>> >> > - the number of splits found are not equal to the number of BSP
tasks
>>> >> > configured for the job. OR
>>> >> > - the flag is set to true by the user
>>> ("bsp.input.runtime.partitioning")
>>> >> OR
>>> >> > - user has specified a Runtime Partitioner class and enabled runtime
>>> >> > partitioning
>>> >> >
>>> >> > Thanks,
>>> >> > Suraj
>>> >> >
>>> >> > On Tue, Jan 8, 2013 at 11:31 AM, Apurv Verma <dapurv5@gmail.com>
>>> wrote:
>>> >> >
>>> >> > > Thanks, let me have a careful look at it. On a cursory look,
I seem
>>> to
>>> >> > > understand the basic idea. Any reasons for deciding to move
the
>>> >> > > PartitioningJob inside BSPJobClient from BSPJob?
>>> >> > > BTW the current partitioner doesn't work as intended, only
the
>>> default
>>> >> > > partitioner HashPartitioner works fine, if I try to put some
custom
>>> >> > > partitioner there are problems.
>>> >> > >
>>> >> > > Let's resolve the partitioning completely before the spilling
>>> message
>>> >> > > queue.
>>> >> > >
>>> >> > >
>>> >> > > --
>>> >> > > Regards,
>>> >> > > Apurv Verma
>>> >> > >
>>> >> > >
>>> >> > >
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message