hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qiaoresearcher <>
Subject how to feed sample of data to each mapper
Date Thu, 27 Feb 2014 06:02:34 GMT
Assume there is one large data set with size 100G on hdfs, how can I
control that every data sent into each mapper is around 10G and the 10G is
random sampled from the 100G data set? Do we have any mahout sample code
doing this?

Any comments will be appreciated.


View raw message