hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurv Verma <dapu...@gmail.com>
Subject Re: Partitioner in Hama
Date Tue, 08 Jan 2013 16:31:49 GMT
Thanks, let me have a careful look at it. On a cursory look, I seem to
understand the basic idea. Any reasons for deciding to move the
PartitioningJob inside BSPJobClient from BSPJob?
BTW the current partitioner doesn't work as intended, only the default
partitioner HashPartitioner works fine, if I try to put some custom
partitioner there are problems.

Let's resolve the partitioning completely before the spilling message queue.


--
Regards,
Apurv Verma




On Tue, Jan 8, 2013 at 8:39 PM, Suraj Menon <surajsmenon@apache.org> wrote:

> Hey Apurv, please check HAMA-700.patch_Jan7. Feel free to provide
> suggestions or even work on it.
>
> Thanks,
> Suraj
>
> On Tue, Jan 8, 2013 at 9:21 AM, Apurv Verma <dapurv5@gmail.com> wrote:
>
> > Hey Edward,
> >  There was a compile bug which i fixed temporarily. isPartitioned was not
> > being initialized. Could you please check the last commit. I have
> currently
> > initialized it to false but I guess this should be configurable.
> > There was some jira where we wanted partitioning to be skipped if user
> > thinks his data is already partitioned.
> >
> > Thanks again.
> >
> >
> > --
> > Regards,
> > Apurv Verma
> >
> >
> >
> >
> > On Tue, Jan 8, 2013 at 3:44 PM, Edward J. Yoon <edwardyoon@apache.org
> > >wrote:
> >
> > > Thanks, then I'll finish tomorrow. Please feel free to comment there.
> > >
> > > On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili
> > > <tommaso.teofili@gmail.com> wrote:
> > > > thanks Edward, it looks good.
> > > > Tommaso
> > > >
> > > >
> > > > 2013/1/8 Edward J. Yoon <edwardyoon@apache.org>
> > > >
> > > >> Please review this:
> > > >>
> > > >> http://wiki.apache.org/hama/Partitioning
> > > >>
> > > >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <
> edwardyoon@apache.org
> > >
> > > >> wrote:
> > > >> > I mean, the pre-partitioning or resizing partitions is really
> > > important.
> > > >> >
> > > >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <
> > edwardyoon@apache.org
> > > >
> > > >> wrote:
> > > >> >> This is another talk ...
> > > >> >>
> > > >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that
input
> is
> > > >> >> small in size but large in computational complexity, such
as
> graph,
> > > >> >> sparse matrix, machine learning algorithms.
> > > >> >>
> > > >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <
> > > edwardyoon@apache.org>
> > > >> wrote:
> > > >> >>> Even though the numbers of splits and tasks are the same,
> > > user-defined
> > > >> >>> partitioning job should be run (because it is not only
for
> > resizing
> > > >> >>> partitions. For example, range partitioning of unsorted
data set
> > or
> > > >> >>> hash key partitioning, ..., etc).
> > > >> >>>
> > > >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <
> > surajsmenon@apache.org
> > > >
> > > >> wrote:
> > > >> >>>>>    1. I am referring to
> org.apache.hama.bsp.PartitioningRunner,
> > > it's
> > > >> named
> > > >> >>>>>    as so in the HEAD (1429573) of trunk. It isn't
removed but
> it
> > > >> isn't
> > > >> >>>>>    referred to anywhere else. I can't find any
references to
> it
> > in
> > > >> the
> > > >> >>>>>    workspace.
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>> It is referred in BSPJob#waitForCompletion function
as a
> separate
> > > BSP
> > > >> job
> > > >> >>>> to create the specified splits.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>>    2. job.setPartitioner is the same as setting
> > > >> >>>>>    "bsp.input.partitioner.class" . Anyways ,
So acc. to me
> > > >> partitions are
> > > >> >>>>> not
> > > >> >>>>>    being created because of which the following
happens.
> > > >> >>>>>    If I am running the task on local fs and not
hdfs, there's
> > just
> > > >> one
> > > >> >>>>>    input split and even if I set a partitioner
to create two
> > > >> partitions and
> > > >> >>>>>    set bsp.setNumTasks(2) , this is overriden
and only one
> task
> > is
> > > >> >>>>> executed.
> > > >> >>>>>    See BSPJobClient#submitJobInternal()
> > > >> >>>>>    where it does the following
> > > >> >>>>>    job.setNumBspTask(writeSplits(job, submitSplitFile,
> > maxTasks));
> > > >> Line
> > > >> >>>>>    326.
> > > >> >>>>>
> > > >> >>>>> This job is set to run if the number of splits
!= number of
> > Tasks
> > > or
> > > >> if
> > > >> >>>> forced by the configuration. I can share my HAMA-700
current
> > state
> > > of
> > > >> patch
> > > >> >>>> with you.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>>    3. So here is what I think is happening, Partitioner
is not
> > in
> > > the
> > > >> >>>>>    codepath (try putting a breakpoint inside
the partitioner
> and
> > > >> executing
> > > >> >>>>> and
> > > >> >>>>>    non graph bsp task), so partitions are not
being created
> and
> > > >> >>>>> writeSplits()
> > > >> >>>>>    is returning 1.
> > > >> >>>>>    [ writeSplits() returns the number of splits
in the input.
> ]
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>> Probably because it is running as a separate process?
> > > >> >>>
> > > >> >>>
> > > >> >>>
> > > >> >>> --
> > > >> >>> Best Regards, Edward J. Yoon
> > > >> >>> @eddieyoon
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Best Regards, Edward J. Yoon
> > > >> >> @eddieyoon
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Best Regards, Edward J. Yoon
> > > >> > @eddieyoon
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best Regards, Edward J. Yoon
> > > >> @eddieyoon
> > > >>
> > >
> > >
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > > @eddieyoon
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message