hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qiaoresearcher <qiaoresearc...@gmail.com>
Subject how to feed sampled data into each mapper
Date Thu, 27 Feb 2014 19:20:33 GMT
Assume there is one large data set with size 100G on hdfs, how can we
control that every data set sent to each mapper is around  10% or original
data (or 10G) and each 10% is random sampled from the 100G data set? Do we
have any example sample code doing this?


View raw message