hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Custom input split
Date Sun, 26 Dec 2010 16:20:41 GMT

On Sun, Dec 26, 2010 at 6:29 PM, Black, Michael (IS)
<Michael.Black2@ngc.com> wrote:
> I assume there's a way to make a specific # of splits and add each document to the separate
splits...but I'll be darned if I can find the docs or an example to show this.

Would CombineFileInputFormat and CombineFileSplit be what you're looking for?

Doc links: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html
& http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/CombineFileSplit.html

> As I said I'm using hadoop-0.20.2 which I know makes a difference as so many things get
deprecated on each release.  Old references don't seem to work.

The API marked deprecated in 0.20.{0,1,2} has been un-deprecated in
the 0.21.0 release  and is also considered as the "stable" API. You
can continue using it, as it is still supported.

(Maybe 0.20.3 will have them un-deprecated too, I'm not sure what's
the status on that, although doing so would surely help avoid beginner
Harsh J

View raw message