hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Custom input split
Date Mon, 27 Dec 2010 02:34:01 GMT
Please don't use attachments. They should be stripped by the Apache
mailer. There are a bunch of mail archiver sites which don't save


On Sun, Dec 26, 2010 at 8:20 AM, Harsh J <qwertymaniac@gmail.com> wrote:
> Hi,
> On Sun, Dec 26, 2010 at 6:29 PM, Black, Michael (IS)
> <Michael.Black2@ngc.com> wrote:
>> I assume there's a way to make a specific # of splits and add each document to the
separate splits...but I'll be darned if I can find the docs or an example to show this.
> Would CombineFileInputFormat and CombineFileSplit be what you're looking for?
> Doc links: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html
> & http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/CombineFileSplit.html
>> As I said I'm using hadoop-0.20.2 which I know makes a difference as so many things
get deprecated on each release.  Old references don't seem to work.
> The API marked deprecated in 0.20.{0,1,2} has been un-deprecated in
> the 0.21.0 release  and is also considered as the "stable" API. You
> can continue using it, as it is still supported.
> (Maybe 0.20.3 will have them un-deprecated too, I'm not sure what's
> the status on that, although doing so would surely help avoid beginner
> confusion.)
> --
> Harsh J
> www.harshj.com

Lance Norskog

View raw message