hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Partitioner in Hama
Date Tue, 08 Jan 2013 09:46:30 GMT
Please review this:

http://wiki.apache.org/hama/Partitioning

On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
> I mean, the pre-partitioning or resizing partitions is really important.
>
> On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
>> This is another talk ...
>>
>> Unlike MapReduce, I think, Hama BSP will handle tasks that input is
>> small in size but large in computational complexity, such as graph,
>> sparse matrix, machine learning algorithms.
>>
>> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
>>> Even though the numbers of splits and tasks are the same, user-defined
>>> partitioning job should be run (because it is not only for resizing
>>> partitions. For example, range partitioning of unsorted data set or
>>> hash key partitioning, ..., etc).
>>>
>>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <surajsmenon@apache.org> wrote:
>>>>>    1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's
named
>>>>>    as so in the HEAD (1429573) of trunk. It isn't removed but it isn't
>>>>>    referred to anywhere else. I can't find any references to it in the
>>>>>    workspace.
>>>>>
>>>>
>>>> It is referred in BSPJob#waitForCompletion function as a separate BSP job
>>>> to create the specified splits.
>>>>
>>>>
>>>>>    2. job.setPartitioner is the same as setting
>>>>>    "bsp.input.partitioner.class" . Anyways , So acc. to me partitions
are
>>>>> not
>>>>>    being created because of which the following happens.
>>>>>    If I am running the task on local fs and not hdfs, there's just one
>>>>>    input split and even if I set a partitioner to create two partitions
and
>>>>>    set bsp.setNumTasks(2) , this is overriden and only one task is
>>>>> executed.
>>>>>    See BSPJobClient#submitJobInternal()
>>>>>    where it does the following
>>>>>    job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); Line
>>>>>    326.
>>>>>
>>>>> This job is set to run if the number of splits != number of Tasks or
if
>>>> forced by the configuration. I can share my HAMA-700 current state of patch
>>>> with you.
>>>>
>>>>
>>>>>    3. So here is what I think is happening, Partitioner is not in the
>>>>>    codepath (try putting a breakpoint inside the partitioner and executing
>>>>> and
>>>>>    non graph bsp task), so partitions are not being created and
>>>>> writeSplits()
>>>>>    is returning 1.
>>>>>    [ writeSplits() returns the number of splits in the input. ]
>>>>>
>>>>
>>>> Probably because it is running as a separate process?
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message