hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edw...@datasayer.com>
Subject Re: Question about FileInputFormat splits
Date Mon, 20 Oct 2014 15:06:05 GMT
Hi Leonidas,

The bsp.min.split.size property is used to prevent to create too many tasks, like Hadoop MR
(NOTE: if bsp.min.split.size is less than block size then 1 block is sent to each task).

I guess this will work fine. BTW, if you set the input partitioner then input partitioner
creates the new partitions as you specified in the setNumBspTask() method (graph job pre-processes
the (hash) input partition by default).

Thanks.

--
Best Regards, Edward J. Yoon
Chief Executive Officer
DataSayer Co., Ltd.

> 2014. 10. 20., 오후 10:51, Leonidas Fegaras <fegaras@cse.uta.edu> 작성:
> 
> Dear Hama developers,
> I still have a problem setting the split size of an HDFS input file using Hama 0.6.4.
 For example, when I use:
> 
> BSPJob job = new BSPJob(conf,BSPop.class);
> job.setNumBspTask(10);
> job.setLong("bsp.min.split.size",10000L);   // 10000 bytes
> 
> For a small file with 2 blocks, this will use only 2 BSP tasks (one for each block),
instead of 10.
> This used to work in Hama 0.5.0.
> Any suggestions?
> Thanks.
> Leonidas Fegaras
> 
> On 01/04/2013 05:45 PM, Edward J. Yoon wrote:
>> Hello,
>> 
>>> than a block. But if you have more nodes in your cluster than data blocks,
>>> you may get faster execution if you allow splits smaller than a block. Is
>> You're right. So, we're working on partitioning issues now.
>> 
>>> you may get faster execution if you allow splits smaller than a block. Is
>>> there any way to use splits smaller than a block in Hama 0.6.0?
>> Yes. But, Hama 0.6.1 version will support it.
>> 
>> On Sat, Jan 5, 2013 at 4:59 AM, Leonidas Fegaras <fegaras@cse.uta.edu> wrote:
>>> Dear Hama developers,
>>> It seems that the splits generated by the FileInputFormat in Hama 0.6.0
>>> cannot be smaller than a block. In Hama 0.5.0, I could set any split size
>>> using  job.set("bsp.min.split.size",...) and set the task numbers using
>>> job.setNumBspTask(...). This is ignored by Hama 0.6.0 for a split smaller
>>> than a block. But if you have more nodes in your cluster than data blocks,
>>> you may get faster execution if you allow splits smaller than a block. Is
>>> there any way to use splits smaller than a block in Hama 0.6.0?
>>> Thanks for your help,
>>> Leonidas
>>> 
>> 
>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message