hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurv Verma <dapu...@gmail.com>
Subject Re: Partitioner in Hama
Date Tue, 08 Jan 2013 14:21:04 GMT
Hey Edward,
 There was a compile bug which i fixed temporarily. isPartitioned was not
being initialized. Could you please check the last commit. I have currently
initialized it to false but I guess this should be configurable.
There was some jira where we wanted partitioning to be skipped if user
thinks his data is already partitioned.

Thanks again.


--
Regards,
Apurv Verma




On Tue, Jan 8, 2013 at 3:44 PM, Edward J. Yoon <edwardyoon@apache.org>wrote:

> Thanks, then I'll finish tomorrow. Please feel free to comment there.
>
> On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili
> <tommaso.teofili@gmail.com> wrote:
> > thanks Edward, it looks good.
> > Tommaso
> >
> >
> > 2013/1/8 Edward J. Yoon <edwardyoon@apache.org>
> >
> >> Please review this:
> >>
> >> http://wiki.apache.org/hama/Partitioning
> >>
> >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <edwardyoon@apache.org>
> >> wrote:
> >> > I mean, the pre-partitioning or resizing partitions is really
> important.
> >> >
> >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <edwardyoon@apache.org
> >
> >> wrote:
> >> >> This is another talk ...
> >> >>
> >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that input is
> >> >> small in size but large in computational complexity, such as graph,
> >> >> sparse matrix, machine learning algorithms.
> >> >>
> >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <
> edwardyoon@apache.org>
> >> wrote:
> >> >>> Even though the numbers of splits and tasks are the same,
> user-defined
> >> >>> partitioning job should be run (because it is not only for resizing
> >> >>> partitions. For example, range partitioning of unsorted data set
or
> >> >>> hash key partitioning, ..., etc).
> >> >>>
> >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <surajsmenon@apache.org
> >
> >> wrote:
> >> >>>>>    1. I am referring to org.apache.hama.bsp.PartitioningRunner,
> it's
> >> named
> >> >>>>>    as so in the HEAD (1429573) of trunk. It isn't removed
but it
> >> isn't
> >> >>>>>    referred to anywhere else. I can't find any references
to it in
> >> the
> >> >>>>>    workspace.
> >> >>>>>
> >> >>>>
> >> >>>> It is referred in BSPJob#waitForCompletion function as a separate
> BSP
> >> job
> >> >>>> to create the specified splits.
> >> >>>>
> >> >>>>
> >> >>>>>    2. job.setPartitioner is the same as setting
> >> >>>>>    "bsp.input.partitioner.class" . Anyways , So acc. to
me
> >> partitions are
> >> >>>>> not
> >> >>>>>    being created because of which the following happens.
> >> >>>>>    If I am running the task on local fs and not hdfs, there's
just
> >> one
> >> >>>>>    input split and even if I set a partitioner to create
two
> >> partitions and
> >> >>>>>    set bsp.setNumTasks(2) , this is overriden and only
one task is
> >> >>>>> executed.
> >> >>>>>    See BSPJobClient#submitJobInternal()
> >> >>>>>    where it does the following
> >> >>>>>    job.setNumBspTask(writeSplits(job, submitSplitFile,
maxTasks));
> >> Line
> >> >>>>>    326.
> >> >>>>>
> >> >>>>> This job is set to run if the number of splits != number
of Tasks
> or
> >> if
> >> >>>> forced by the configuration. I can share my HAMA-700 current
state
> of
> >> patch
> >> >>>> with you.
> >> >>>>
> >> >>>>
> >> >>>>>    3. So here is what I think is happening, Partitioner
is not in
> the
> >> >>>>>    codepath (try putting a breakpoint inside the partitioner
and
> >> executing
> >> >>>>> and
> >> >>>>>    non graph bsp task), so partitions are not being created
and
> >> >>>>> writeSplits()
> >> >>>>>    is returning 1.
> >> >>>>>    [ writeSplits() returns the number of splits in the
input. ]
> >> >>>>>
> >> >>>>
> >> >>>> Probably because it is running as a separate process?
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Best Regards, Edward J. Yoon
> >> >>> @eddieyoon
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> @eddieyoon
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards, Edward J. Yoon
> >> > @eddieyoon
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message