hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurv Verma <dapu...@gmail.com>
Subject Re: Partitioner in Hama
Date Sun, 06 Jan 2013 19:46:38 GMT
   1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's named
   as so in the HEAD (1429573) of trunk. It isn't removed but it isn't
   referred to anywhere else. I can't find any references to it in the
   workspace.
   2. job.setPartitioner is the same as setting
   "bsp.input.partitioner.class" . Anyways , So acc. to me partitions are not
   being created because of which the following happens.
   If I am running the task on local fs and not hdfs, there's just one
   input split and even if I set a partitioner to create two partitions and
   set bsp.setNumTasks(2) , this is overriden and only one task is executed.
   See BSPJobClient#submitJobInternal()
   where it does the following
   job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); Line
   326.

   3. So here is what I think is happening, Partitioner is not in the
   codepath (try putting a breakpoint inside the partitioner and executing and
   non graph bsp task), so partitions are not being created and writeSplits()
   is returning 1.
   [ writeSplits() returns the number of splits in the input. ]




--
Regards,
Apurv Verma




On Sun, Jan 6, 2013 at 9:05 PM, Suraj Menon <surajsmenon@apache.org> wrote:

> Are you referring to org.apache.hama.bsp.PartitionRunner ? I don't see a
> commit removing the class.
> PartitionRunner is designed to be a Hama job in itself to create the
> expected splits before starting the submitted job.
> You can use your own Partitioner in the config using
> "bsp.input.partitioner.class" . Hopefully I answered your question.
>
> I am trying to make things backward compatible[ HAMA-700 ], but facing some
> problems. The goal is to have runtime partitioning of graphs done by
> PartitionRunner itself.
>
> -Suraj
>
> On Sun, Jan 6, 2013 at 9:54 AM, Apurv Verma <dapurv5@gmail.com> wrote:
>
> > Hey all,
> >  I found that PartitioningRunner has been removed from the codepath, I
> > guess this is the right way to make jobs faster.
> > But in the current scenario is it possible to have something all
> > follows. I want that all values < some integer are designated to peer
> > index 0, all values in range 0-a to peer index 1, and so on and so
> > forth.
> > With the partitioning removed would i need to use an additional
> > superstep to do this classification of input records.
> >
> >
> > --
> > Regards,
> > Apurv Verma
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message