hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <...@yahoo-inc.com>
Subject Re: InputFiles, Splits, Maps, Tasks Questions 1.3 Base
Date Thu, 18 Oct 2007 23:03:41 GMT

On Oct 18, 2007, at 3:30 PM, Lance Amundsen wrote:

> Thx, I'll give that a try.   Seems to me a method to tell hadoop to  
> split a
> file every "n" key/value pairs would be logical.  Or maybe a
> createSplitBoundary when appending key/value records?

The problem is that the split generator doesn't want to read the data  
files. So it picks byte ranges as a reasonable proxy. I know of some  
applications that have custom input formats that use md5 ranges as  
input splits and read multiple files for each split. You could  
equivalently use rows, as long as you had an index.

-- Owen

View raw message