hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Partitioner in Hama
Date Tue, 08 Jan 2013 10:04:21 GMT
thanks Edward, it looks good.
Tommaso


2013/1/8 Edward J. Yoon <edwardyoon@apache.org>

> Please review this:
>
> http://wiki.apache.org/hama/Partitioning
>
> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <edwardyoon@apache.org>
> wrote:
> > I mean, the pre-partitioning or resizing partitions is really important.
> >
> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <edwardyoon@apache.org>
> wrote:
> >> This is another talk ...
> >>
> >> Unlike MapReduce, I think, Hama BSP will handle tasks that input is
> >> small in size but large in computational complexity, such as graph,
> >> sparse matrix, machine learning algorithms.
> >>
> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <edwardyoon@apache.org>
> wrote:
> >>> Even though the numbers of splits and tasks are the same, user-defined
> >>> partitioning job should be run (because it is not only for resizing
> >>> partitions. For example, range partitioning of unsorted data set or
> >>> hash key partitioning, ..., etc).
> >>>
> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <surajsmenon@apache.org>
> wrote:
> >>>>>    1. I am referring to org.apache.hama.bsp.PartitioningRunner,
it's
> named
> >>>>>    as so in the HEAD (1429573) of trunk. It isn't removed but it
> isn't
> >>>>>    referred to anywhere else. I can't find any references to it
in
> the
> >>>>>    workspace.
> >>>>>
> >>>>
> >>>> It is referred in BSPJob#waitForCompletion function as a separate BSP
> job
> >>>> to create the specified splits.
> >>>>
> >>>>
> >>>>>    2. job.setPartitioner is the same as setting
> >>>>>    "bsp.input.partitioner.class" . Anyways , So acc. to me
> partitions are
> >>>>> not
> >>>>>    being created because of which the following happens.
> >>>>>    If I am running the task on local fs and not hdfs, there's just
> one
> >>>>>    input split and even if I set a partitioner to create two
> partitions and
> >>>>>    set bsp.setNumTasks(2) , this is overriden and only one task
is
> >>>>> executed.
> >>>>>    See BSPJobClient#submitJobInternal()
> >>>>>    where it does the following
> >>>>>    job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks));
> Line
> >>>>>    326.
> >>>>>
> >>>>> This job is set to run if the number of splits != number of Tasks
or
> if
> >>>> forced by the configuration. I can share my HAMA-700 current state of
> patch
> >>>> with you.
> >>>>
> >>>>
> >>>>>    3. So here is what I think is happening, Partitioner is not in
the
> >>>>>    codepath (try putting a breakpoint inside the partitioner and
> executing
> >>>>> and
> >>>>>    non graph bsp task), so partitions are not being created and
> >>>>> writeSplits()
> >>>>>    is returning 1.
> >>>>>    [ writeSplits() returns the number of splits in the input. ]
> >>>>>
> >>>>
> >>>> Probably because it is running as a separate process?
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>> @eddieyoon
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message