hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <...@yahoo-inc.com>
Subject Re: InputFiles, Splits, Maps, Tasks Questions 1.3 Base
Date Wed, 24 Oct 2007 20:05:46 GMT

On Oct 24, 2007, at 12:42 PM, Doug Cutting wrote:

> Lance Amundsen wrote:
>> OK, that is encouraging.  I'll take another pass at it.  I succeeded
>> yesterday with an in-memory only InputFormat, but only after I  
>> commented
>> out some of the split referencing code, like the following in  
>> MapTask.java
>>     if (instantiatedSplit instanceof FileSplit) {
>>       FileSplit fileSplit = (FileSplit) instantiatedSplit;
>>       job.set("map.input.file", fileSplit.getPath().toString());
>>       job.setLong("map.input.start", fileSplit.getStart());
>>       job.setLong("map.input.length", fileSplit.getLength());
>>     }
> Yes, that code should not exist, but it shouldn't affect you  
> either. You should be subclassing InputSplit, not FileSplit, so  
> this code shouldn't operate on your splits.

That code doesn't do anything if they are non file-splits, so it  
absolutely shouldn't break anything. Applications depend on those  
attributes to know which split they are working on and there isn't a  
better fix until we move to context objects. I know that non- 
filesplits work because there are units tests to make sure they don't  
break anything.

-- Owen

View raw message