hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Menon <surajsme...@apache.org>
Subject Re: Partitioner in Hama
Date Tue, 08 Jan 2013 23:56:22 GMT
> Keeping run-time (network-based) partitioning within GraphJobRunner is
> not good idea.


It is not. I think I got testSubmitGraph to runtime partition (in
preprocessing step) the single file into 2 files in the unit tests in my
current state of patch..


> >> - the number of splits found are not equal to the number of BSP tasks
> >> configured for the job. OR
>
> I have a question. If the input is unsorted map and I want to
> re-partition by hashing but the numbers of blocks and desired tasks
> are same, then what happens? Do you mean run-time partitioning?

You will have runtime partitioner class defined and partitioning flag on by
default. For case of HAMA-561 a user can switch off partitioning using the
same flag.



> On Wed, Jan 9, 2013 at 7:07 AM, Suraj Menon <surajsmenon@apache.org>
> wrote:
> > Hi Apurv, yes, those are pending test cases to be fixed. GraphJobRunner
> is
> > expecting the input in the format of Vertex, but we have input files as
> > well as record key, values defined as Text. I have fixed only one unit
> test
> > case yet.
> >
> > On Tue, Jan 8, 2013 at 4:45 PM, Apurv Verma <dapurv5@gmail.com> wrote:
> >
> >> Hey all,
> >>  I got the problem, the partitioner was not being set for the
> >> PartitionerRunner bsp task. :P I have fixed the partitioner with
> portions
> >> from your patch Suraj. Now after this commit partitioner will obey what
> you
> >> specified earlier, just to recapitulate.
> >>
> >> Repartitioning is done if :
> >> - the number of splits found are not equal to the number of BSP tasks
> >> configured for the job. OR
> >> - the flag is set to true by the user
> ("bsp.input.runtime.partitioning") OR
> >> - user has specified a Runtime Partitioner class and enabled runtime
> >> partitioning
> >>
> >> There was one special thing that I discovered about partitioner , just
> >> sharing with you guys. Suppose I implement a partitioner which returns 0
> >> for a record, then it isn't necessary that this record will go to peer
> with
> >> index 0. It might go to peer 1. The only certitude which partitioner's
> >> provide is that all records returning 0 will go to the same peer. I
> needed
> >> partitioner to work for PrefixSum I was implementing.
> >>
> >> Things to do next.
> >> 1) RecordConverter , which Suraj is implementing in HAMA-700. (Please
> >> update Suraj)
> >>
> >> B.T.W there are problems in mvn test.
> >> *java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast
> to
> >> org.apache.hadoop.io.ArrayWritable*
> >> * at
> >>
> org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:287)*
> >> *
> >> *
> >> I don't think my commit is breaking this.
> >>
> >> Thanks
> >>
> >>
> >> --
> >> Regards,
> >> Apurv Verma
> >>
> >>
> >>
> >>
> >> On Tue, Jan 8, 2013 at 11:07 PM, Suraj Menon <surajsmenon@apache.org>
> >> wrote:
> >>
> >> > Please explain the nature of problems you are facing with Partitioner?
> >> >
> >> > >Any reasons for deciding to move the
> >> > > PartitioningJob inside BSPJobClient from BSPJob?
> >> >
> >> > Twofold, BSPJob was just a configuration holder object, didn't want to
> >> add
> >> > the partitioning responsibility to the class.
> >> > And also I wanted to know the number of splits, before taking the
> >> decision
> >> > whether to repartition or not.
> >> > Repartitioning is done if :
> >> > - the number of splits found are not equal to the number of BSP tasks
> >> > configured for the job. OR
> >> > - the flag is set to true by the user
> ("bsp.input.runtime.partitioning")
> >> OR
> >> > - user has specified a Runtime Partitioner class and enabled runtime
> >> > partitioning
> >> >
> >> > Thanks,
> >> > Suraj
> >> >
> >> > On Tue, Jan 8, 2013 at 11:31 AM, Apurv Verma <dapurv5@gmail.com>
> wrote:
> >> >
> >> > > Thanks, let me have a careful look at it. On a cursory look, I seem
> to
> >> > > understand the basic idea. Any reasons for deciding to move the
> >> > > PartitioningJob inside BSPJobClient from BSPJob?
> >> > > BTW the current partitioner doesn't work as intended, only the
> default
> >> > > partitioner HashPartitioner works fine, if I try to put some custom
> >> > > partitioner there are problems.
> >> > >
> >> > > Let's resolve the partitioning completely before the spilling
> message
> >> > > queue.
> >> > >
> >> > >
> >> > > --
> >> > > Regards,
> >> > > Apurv Verma
> >> > >
> >> > >
> >> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message