hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Test <erik.shi...@gmail.com>
Subject [Input split] File manipulation
Date Tue, 17 Aug 2010 14:50:52 GMT
Hello,

I'm trying to determine how to split a file evenly so each map task has a
similar work load. The input I will have is a list of coordinates like this:

2,8
3,9
4,10
5,7
6,2
7,3
8,1
9,0
10,4

Since there are 9 inputs in this example, I would like to split the records
so that there would be 3 map tasks.

I've been looking into different text input format classes but I'm still not
sure how to split the input file the way I would like to.

Does anyone have advice or suggestions how I can go about manipulating the
input splits by specifying the number of lines are in an input split?

Erik

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message