hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Partitioner in Hama
Date Sun, 06 Jan 2013 21:17:07 GMT
I mean, the pre-partitioning or resizing partitions is really important.

On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
> This is another talk ...
>
> Unlike MapReduce, I think, Hama BSP will handle tasks that input is
> small in size but large in computational complexity, such as graph,
> sparse matrix, machine learning algorithms.
>
> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
>> Even though the numbers of splits and tasks are the same, user-defined
>> partitioning job should be run (because it is not only for resizing
>> partitions. For example, range partitioning of unsorted data set or
>> hash key partitioning, ..., etc).
>>
>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <surajsmenon@apache.org> wrote:
>>>>    1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's named
>>>>    as so in the HEAD (1429573) of trunk. It isn't removed but it isn't
>>>>    referred to anywhere else. I can't find any references to it in the
>>>>    workspace.
>>>>
>>>
>>> It is referred in BSPJob#waitForCompletion function as a separate BSP job
>>> to create the specified splits.
>>>
>>>
>>>>    2. job.setPartitioner is the same as setting
>>>>    "bsp.input.partitioner.class" . Anyways , So acc. to me partitions are
>>>> not
>>>>    being created because of which the following happens.
>>>>    If I am running the task on local fs and not hdfs, there's just one
>>>>    input split and even if I set a partitioner to create two partitions and
>>>>    set bsp.setNumTasks(2) , this is overriden and only one task is
>>>> executed.
>>>>    See BSPJobClient#submitJobInternal()
>>>>    where it does the following
>>>>    job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); Line
>>>>    326.
>>>>
>>>> This job is set to run if the number of splits != number of Tasks or if
>>> forced by the configuration. I can share my HAMA-700 current state of patch
>>> with you.
>>>
>>>
>>>>    3. So here is what I think is happening, Partitioner is not in the
>>>>    codepath (try putting a breakpoint inside the partitioner and executing
>>>> and
>>>>    non graph bsp task), so partitions are not being created and
>>>> writeSplits()
>>>>    is returning 1.
>>>>    [ writeSplits() returns the number of splits in the input. ]
>>>>
>>>
>>> Probably because it is running as a separate process?
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message