hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: [Input split] File manipulation
Date Tue, 17 Aug 2010 15:44:54 GMT
What size is your input ? If the input size is large enough, you do
not need to worry about the splitting, only one split (the last split)
has the different size, all the other splits has the same split.



On Tue, Aug 17, 2010 at 7:50 AM, Erik Test <erik.shiken@gmail.com> wrote:
> Hello,
>
> I'm trying to determine how to split a file evenly so each map task has a
> similar work load. The input I will have is a list of coordinates like this:
>
> 2,8
> 3,9
> 4,10
> 5,7
> 6,2
> 7,3
> 8,1
> 9,0
> 10,4
>
> Since there are 9 inputs in this example, I would like to split the records
> so that there would be 3 map tasks.
>
> I've been looking into different text input format classes but I'm still not
> sure how to split the input file the way I would like to.
>
> Does anyone have advice or suggestions how I can go about manipulating the
> input splits by specifying the number of lines are in an input split?
>
> Erik
>



-- 
Best Regards

Jeff Zhang

Mime
View raw message