hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qiaoresearcher <qiaoresearc...@gmail.com>
Subject Re: how to feed sampled data into each mapper
Date Fri, 28 Feb 2014 02:14:11 GMT
thanks, i think what you suggest is to just divide the large file into
various splits and each split is about 10G, but how to make this 10G is
'random sampled' from the original large data set?

On Thu, Feb 27, 2014 at 7:40 PM, Hadoop User <hadoopuser5@gmail.com> wrote:

> Try changing split size in the driver code.
> Mapreduce split size properties
> Sent from my iPhone
> On Feb 27, 2014, at 11:20 AM, qiaoresearcher <qiaoresearcher@gmail.com>
> wrote:
> Assume there is one large data set with size 100G on hdfs, how can we
> control that every data set sent to each mapper is around  10% or original
> data (or 10G) and each 10% is random sampled from the 100G data set? Do we
> have any example sample code doing this?
> Regards,

View raw message