hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Partitioner in Hama
Date Sun, 06 Jan 2013 21:15:11 GMT
This is another talk ...

Unlike MapReduce, I think, Hama BSP will handle tasks that input is
small in size but large in computational complexity, such as graph,
sparse matrix, machine learning algorithms.

On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
> Even though the numbers of splits and tasks are the same, user-defined
> partitioning job should be run (because it is not only for resizing
> partitions. For example, range partitioning of unsorted data set or
> hash key partitioning, ..., etc).
> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <surajsmenon@apache.org> wrote:
>>>    1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's named
>>>    as so in the HEAD (1429573) of trunk. It isn't removed but it isn't
>>>    referred to anywhere else. I can't find any references to it in the
>>>    workspace.
>> It is referred in BSPJob#waitForCompletion function as a separate BSP job
>> to create the specified splits.
>>>    2. job.setPartitioner is the same as setting
>>>    "bsp.input.partitioner.class" . Anyways , So acc. to me partitions are
>>> not
>>>    being created because of which the following happens.
>>>    If I am running the task on local fs and not hdfs, there's just one
>>>    input split and even if I set a partitioner to create two partitions and
>>>    set bsp.setNumTasks(2) , this is overriden and only one task is
>>> executed.
>>>    See BSPJobClient#submitJobInternal()
>>>    where it does the following
>>>    job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); Line
>>>    326.
>>> This job is set to run if the number of splits != number of Tasks or if
>> forced by the configuration. I can share my HAMA-700 current state of patch
>> with you.
>>>    3. So here is what I think is happening, Partitioner is not in the
>>>    codepath (try putting a breakpoint inside the partitioner and executing
>>> and
>>>    non graph bsp task), so partitions are not being created and
>>> writeSplits()
>>>    is returning 1.
>>>    [ writeSplits() returns the number of splits in the input. ]
>> Probably because it is running as a separate process?
> --
> Best Regards, Edward J. Yoon
> @eddieyoon

Best Regards, Edward J. Yoon

View raw message