hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Black, Michael (IS)" <Michael.Bla...@ngc.com>
Subject Custom input split
Date Fri, 24 Dec 2010 16:34:57 GMT
Using hadoop-0.20

I'm doing custom input splits from a Lucene index.

I want to split the document ID's across N mappers (I'm testing the
scalabilty of the problem across 4 nodes and 8 cores).

So the key is the document# and they are not sequential.

At this point I'm using splits.add to add each document...but that sets up
one task for every document...not something I want to do of course.

How can I add a group of documents to each split?  I found a scant reference
to PrimeInputSplit but that doesn't seem to resolve on hadoop-0.20.

Michael D. Black
Senior Scientist
Nothrop Grumman Information Systems
Advanced Analytics Directorate

View raw message