hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurv Verma <dapu...@gmail.com>
Subject Re: Partitioner in Hama
Date Tue, 08 Jan 2013 21:45:51 GMT
Hey all,
 I got the problem, the partitioner was not being set for the
PartitionerRunner bsp task. :P I have fixed the partitioner with portions
from your patch Suraj. Now after this commit partitioner will obey what you
specified earlier, just to recapitulate.

Repartitioning is done if :
- the number of splits found are not equal to the number of BSP tasks
configured for the job. OR
- the flag is set to true by the user ("bsp.input.runtime.partitioning") OR
- user has specified a Runtime Partitioner class and enabled runtime
partitioning

There was one special thing that I discovered about partitioner , just
sharing with you guys. Suppose I implement a partitioner which returns 0
for a record, then it isn't necessary that this record will go to peer with
index 0. It might go to peer 1. The only certitude which partitioner's
provide is that all records returning 0 will go to the same peer. I needed
partitioner to work for PrefixSum I was implementing.

Things to do next.
1) RecordConverter , which Suraj is implementing in HAMA-700. (Please
update Suraj)

B.T.W there are problems in mvn test.
*java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.hadoop.io.ArrayWritable*
* at
org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:287)*
*
*
I don't think my commit is breaking this.

Thanks


--
Regards,
Apurv Verma




On Tue, Jan 8, 2013 at 11:07 PM, Suraj Menon <surajsmenon@apache.org> wrote:

> Please explain the nature of problems you are facing with Partitioner?
>
> >Any reasons for deciding to move the
> > PartitioningJob inside BSPJobClient from BSPJob?
>
> Twofold, BSPJob was just a configuration holder object, didn't want to add
> the partitioning responsibility to the class.
> And also I wanted to know the number of splits, before taking the decision
> whether to repartition or not.
> Repartitioning is done if :
> - the number of splits found are not equal to the number of BSP tasks
> configured for the job. OR
> - the flag is set to true by the user ("bsp.input.runtime.partitioning") OR
> - user has specified a Runtime Partitioner class and enabled runtime
> partitioning
>
> Thanks,
> Suraj
>
> On Tue, Jan 8, 2013 at 11:31 AM, Apurv Verma <dapurv5@gmail.com> wrote:
>
> > Thanks, let me have a careful look at it. On a cursory look, I seem to
> > understand the basic idea. Any reasons for deciding to move the
> > PartitioningJob inside BSPJobClient from BSPJob?
> > BTW the current partitioner doesn't work as intended, only the default
> > partitioner HashPartitioner works fine, if I try to put some custom
> > partitioner there are problems.
> >
> > Let's resolve the partitioning completely before the spilling message
> > queue.
> >
> >
> > --
> > Regards,
> > Apurv Verma
> >
> >
> >
> >
> > On Tue, Jan 8, 2013 at 8:39 PM, Suraj Menon <surajsmenon@apache.org>
> > wrote:
> >
> > > Hey Apurv, please check HAMA-700.patch_Jan7. Feel free to provide
> > > suggestions or even work on it.
> > >
> > > Thanks,
> > > Suraj
> > >
> > > On Tue, Jan 8, 2013 at 9:21 AM, Apurv Verma <dapurv5@gmail.com> wrote:
> > >
> > > > Hey Edward,
> > > >  There was a compile bug which i fixed temporarily. isPartitioned was
> > not
> > > > being initialized. Could you please check the last commit. I have
> > > currently
> > > > initialized it to false but I guess this should be configurable.
> > > > There was some jira where we wanted partitioning to be skipped if
> user
> > > > thinks his data is already partitioned.
> > > >
> > > > Thanks again.
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Apurv Verma
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Jan 8, 2013 at 3:44 PM, Edward J. Yoon <
> edwardyoon@apache.org
> > > > >wrote:
> > > >
> > > > > Thanks, then I'll finish tomorrow. Please feel free to comment
> there.
> > > > >
> > > > > On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili
> > > > > <tommaso.teofili@gmail.com> wrote:
> > > > > > thanks Edward, it looks good.
> > > > > > Tommaso
> > > > > >
> > > > > >
> > > > > > 2013/1/8 Edward J. Yoon <edwardyoon@apache.org>
> > > > > >
> > > > > >> Please review this:
> > > > > >>
> > > > > >> http://wiki.apache.org/hama/Partitioning
> > > > > >>
> > > > > >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <
> > > edwardyoon@apache.org
> > > > >
> > > > > >> wrote:
> > > > > >> > I mean, the pre-partitioning or resizing partitions
is really
> > > > > important.
> > > > > >> >
> > > > > >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <
> > > > edwardyoon@apache.org
> > > > > >
> > > > > >> wrote:
> > > > > >> >> This is another talk ...
> > > > > >> >>
> > > > > >> >> Unlike MapReduce, I think, Hama BSP will handle
tasks that
> > input
> > > is
> > > > > >> >> small in size but large in computational complexity,
such as
> > > graph,
> > > > > >> >> sparse matrix, machine learning algorithms.
> > > > > >> >>
> > > > > >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon
<
> > > > > edwardyoon@apache.org>
> > > > > >> wrote:
> > > > > >> >>> Even though the numbers of splits and tasks
are the same,
> > > > > user-defined
> > > > > >> >>> partitioning job should be run (because it
is not only for
> > > > resizing
> > > > > >> >>> partitions. For example, range partitioning
of unsorted data
> > set
> > > > or
> > > > > >> >>> hash key partitioning, ..., etc).
> > > > > >> >>>
> > > > > >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon
<
> > > > surajsmenon@apache.org
> > > > > >
> > > > > >> wrote:
> > > > > >> >>>>>    1. I am referring to
> > > org.apache.hama.bsp.PartitioningRunner,
> > > > > it's
> > > > > >> named
> > > > > >> >>>>>    as so in the HEAD (1429573) of trunk.
It isn't removed
> > but
> > > it
> > > > > >> isn't
> > > > > >> >>>>>    referred to anywhere else. I can't
find any references
> to
> > > it
> > > > in
> > > > > >> the
> > > > > >> >>>>>    workspace.
> > > > > >> >>>>>
> > > > > >> >>>>
> > > > > >> >>>> It is referred in BSPJob#waitForCompletion
function as a
> > > separate
> > > > > BSP
> > > > > >> job
> > > > > >> >>>> to create the specified splits.
> > > > > >> >>>>
> > > > > >> >>>>
> > > > > >> >>>>>    2. job.setPartitioner is the same
as setting
> > > > > >> >>>>>    "bsp.input.partitioner.class" .
Anyways , So acc. to me
> > > > > >> partitions are
> > > > > >> >>>>> not
> > > > > >> >>>>>    being created because of which the
following happens.
> > > > > >> >>>>>    If I am running the task on local
fs and not hdfs,
> > there's
> > > > just
> > > > > >> one
> > > > > >> >>>>>    input split and even if I set a
partitioner to create
> two
> > > > > >> partitions and
> > > > > >> >>>>>    set bsp.setNumTasks(2) , this is
overriden and only one
> > > task
> > > > is
> > > > > >> >>>>> executed.
> > > > > >> >>>>>    See BSPJobClient#submitJobInternal()
> > > > > >> >>>>>    where it does the following
> > > > > >> >>>>>    job.setNumBspTask(writeSplits(job,
submitSplitFile,
> > > > maxTasks));
> > > > > >> Line
> > > > > >> >>>>>    326.
> > > > > >> >>>>>
> > > > > >> >>>>> This job is set to run if the number
of splits != number
> of
> > > > Tasks
> > > > > or
> > > > > >> if
> > > > > >> >>>> forced by the configuration. I can share
my HAMA-700
> current
> > > > state
> > > > > of
> > > > > >> patch
> > > > > >> >>>> with you.
> > > > > >> >>>>
> > > > > >> >>>>
> > > > > >> >>>>>    3. So here is what I think is happening,
Partitioner is
> > not
> > > > in
> > > > > the
> > > > > >> >>>>>    codepath (try putting a breakpoint
inside the
> partitioner
> > > and
> > > > > >> executing
> > > > > >> >>>>> and
> > > > > >> >>>>>    non graph bsp task), so partitions
are not being
> created
> > > and
> > > > > >> >>>>> writeSplits()
> > > > > >> >>>>>    is returning 1.
> > > > > >> >>>>>    [ writeSplits() returns the number
of splits in the
> > input.
> > > ]
> > > > > >> >>>>>
> > > > > >> >>>>
> > > > > >> >>>> Probably because it is running as a separate
process?
> > > > > >> >>>
> > > > > >> >>>
> > > > > >> >>>
> > > > > >> >>> --
> > > > > >> >>> Best Regards, Edward J. Yoon
> > > > > >> >>> @eddieyoon
> > > > > >> >>
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> --
> > > > > >> >> Best Regards, Edward J. Yoon
> > > > > >> >> @eddieyoon
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > --
> > > > > >> > Best Regards, Edward J. Yoon
> > > > > >> > @eddieyoon
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Best Regards, Edward J. Yoon
> > > > > >> @eddieyoon
> > > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Edward J. Yoon
> > > > > @eddieyoon
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message