hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Control the file splits size
Date Mon, 23 Aug 2010 19:14:12 GMT


Uhm...

There may be more to the initial question.

The OP indicated that this was a 'binary file' and that the records may not be based on an
end-of-line.
So he may want to look at how to handle different types of input too.


> From: qwertymaniac@gmail.com
> Date: Mon, 23 Aug 2010 18:39:48 +0530
> Subject: Re: Control the file splits size
> To: common-user@hadoop.apache.org
> 
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html#isSplitable(org.apache.hadoop.fs.FileSystem,
> org.apache.hadoop.fs.Path)
> 
> The isSplitable is the method you're looking for -- return false for
> this in your custom input format (derived from FIF or etc.).
> 
> On Mon, Aug 23, 2010 at 4:08 PM, Teodor Macicas <teodor.macicas@epfl.ch> wrote:
> > Hi all,
> >
> > Can anyone please tell me how to control the splits size ? I have one big
> > file which will be splitted by the number of maps. The input file is binary
> > and contains some objects. I do not want to split an object into 2 separate
> > files, for sure.
> > I overwrite the computeSplitSize() file and I forced the size to be a
> > multiple of my objects size. It worked, but it seems that on certain points
> > of the output file objects are missing. And now I am thinking that this
> > could be my problem.
> >
> > Have anyone faced this problem before ?
> >
> > Thank you.
> > Regards,
> > Teodor
> >
> 
> 
> 
> -- 
> Harsh J
> www.harshj.com
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message