hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unmesha sreeveni <unmeshab...@gmail.com>
Subject Split files into 80% and 20% for building model and prediction
Date Fri, 12 Dec 2014 09:30:27 GMT
I am trying to divide my HDFS file into 2 parts/files
80% and 20% for classification algorithm(80% for modelling and 20% for
prediction)
Please provide suggestion for the same.
To take 80% and 20% to 2 seperate files we need to know the exact number of
record in the data set
And it is only known if we go through the data set once.
so we need to write 1 MapReduce Job for just counting the number of records
and
2 nd Mapreduce Job for separating 80% and 20% into 2 files using Multiple
Inputs.


Am I in the right track or there is any alternative for the same.
But again a small confusion how to check if the reducer get filled with 80%
data.


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Mime
View raw message