hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JAX <jayunit...@gmail.com>
Subject Re: isSplitable() problem
Date Mon, 23 Apr 2012 13:13:37 GMT
Curious : Seems like you could aggregate the results in the mapper as a local variable or list
of strings--- is there a way to know that your mapper has just read the LAST line of an input

I.e if so, then you could implement your entire solution in your mapper without needing a
new input format z?

Is there a "cleanup" or "finalize" method in mappers that is run at the end of a whole steam
read to support these sort of chunked, in memor map/r operations?

Jay Vyas 

On Apr 23, 2012, at 6:40 AM, Dan Drew <wirefreak@googlemail.com> wrote:

> I require each input file to be processed by each mapper as a whole.
> I subclass c.o.a.h.mapreduce.lib.input.TextInputFormat and override
> isSplitable() to invariably return false.
> The job is configured to use this subclass as the input format class via
> setInputFormatClass(). The job runs without error, yet the logs reveal
> files are still processed line by line by the mappers.
> Any help would be greatly appreciated,
> Thanks

View raw message