hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Amundsen <lc...@us.ibm.com>
Subject Re: InputFiles, Splits, Maps, Tasks Questions 1.3 Base
Date Thu, 18 Oct 2007 22:30:43 GMT
Thx, I'll give that a try.   Seems to me a method to tell hadoop to split a
file every "n" key/value pairs would be logical.  Or maybe a
createSplitBoundary when appending key/value records?

I just want a way, and not a real complex way, of directing the # of maps
and the breakdown of records going to them.  Creating a separate file per
record group is too slow for my purposes.

Lance

IBM Software Group - Strategy
Performance Architect
High-Performance On Demand Solutions (HiPODS)

650-678-8425 cell




                                                                           
             Doug Cutting                                                  
             <cutting@apache.o                                             
             rg>                                                        To 
                                       hadoop-user@lucene.apache.org       
             10/18/2007 03:21                                           cc 
             PM                                                            
                                                                   Subject 
                                       Re: InputFiles, Splits, Maps, Tasks 
             Please respond to         Questions 1.3 Base                  
             hadoop-user@lucen                                             
               e.apache.org                                                
                                                                           
                                                                           
                                                                           
                                                                           




Lance Amundsen wrote:
> There's lots of references on decreasing DFS block size to increase maps
to
> record ratios.  What is the easiest way to do this?  Is it possible with
> the standard SequenceFile class?

You could specify the block size in the Configuration parameter to
SequenceFile#createWriter() using the dfs.block.size parameter.  But if
you simply want to create sub-block-size splits, then increasing the
number of map tasks should do that.

Doug



Mime
View raw message