hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Partitioner in Hama
Date Tue, 08 Jan 2013 10:14:50 GMT
Thanks, then I'll finish tomorrow. Please feel free to comment there.

On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili
<tommaso.teofili@gmail.com> wrote:
> thanks Edward, it looks good.
> Tommaso
>
>
> 2013/1/8 Edward J. Yoon <edwardyoon@apache.org>
>
>> Please review this:
>>
>> http://wiki.apache.org/hama/Partitioning
>>
>> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <edwardyoon@apache.org>
>> wrote:
>> > I mean, the pre-partitioning or resizing partitions is really important.
>> >
>> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <edwardyoon@apache.org>
>> wrote:
>> >> This is another talk ...
>> >>
>> >> Unlike MapReduce, I think, Hama BSP will handle tasks that input is
>> >> small in size but large in computational complexity, such as graph,
>> >> sparse matrix, machine learning algorithms.
>> >>
>> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <edwardyoon@apache.org>
>> wrote:
>> >>> Even though the numbers of splits and tasks are the same, user-defined
>> >>> partitioning job should be run (because it is not only for resizing
>> >>> partitions. For example, range partitioning of unsorted data set or
>> >>> hash key partitioning, ..., etc).
>> >>>
>> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <surajsmenon@apache.org>
>> wrote:
>> >>>>>    1. I am referring to org.apache.hama.bsp.PartitioningRunner,
it's
>> named
>> >>>>>    as so in the HEAD (1429573) of trunk. It isn't removed but
it
>> isn't
>> >>>>>    referred to anywhere else. I can't find any references to
it in
>> the
>> >>>>>    workspace.
>> >>>>>
>> >>>>
>> >>>> It is referred in BSPJob#waitForCompletion function as a separate
BSP
>> job
>> >>>> to create the specified splits.
>> >>>>
>> >>>>
>> >>>>>    2. job.setPartitioner is the same as setting
>> >>>>>    "bsp.input.partitioner.class" . Anyways , So acc. to me
>> partitions are
>> >>>>> not
>> >>>>>    being created because of which the following happens.
>> >>>>>    If I am running the task on local fs and not hdfs, there's
just
>> one
>> >>>>>    input split and even if I set a partitioner to create two
>> partitions and
>> >>>>>    set bsp.setNumTasks(2) , this is overriden and only one task
is
>> >>>>> executed.
>> >>>>>    See BSPJobClient#submitJobInternal()
>> >>>>>    where it does the following
>> >>>>>    job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks));
>> Line
>> >>>>>    326.
>> >>>>>
>> >>>>> This job is set to run if the number of splits != number of
Tasks or
>> if
>> >>>> forced by the configuration. I can share my HAMA-700 current state
of
>> patch
>> >>>> with you.
>> >>>>
>> >>>>
>> >>>>>    3. So here is what I think is happening, Partitioner is not
in the
>> >>>>>    codepath (try putting a breakpoint inside the partitioner
and
>> executing
>> >>>>> and
>> >>>>>    non graph bsp task), so partitions are not being created
and
>> >>>>> writeSplits()
>> >>>>>    is returning 1.
>> >>>>>    [ writeSplits() returns the number of splits in the input.
]
>> >>>>>
>> >>>>
>> >>>> Probably because it is running as a separate process?
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>> @eddieyoon
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message