hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unmesha sreeveni <unmeshab...@gmail.com>
Subject How to partition a file to smaller size for performing KNN in hadoop mapreduce
Date Thu, 15 Jan 2015 06:06:55 GMT
In KNN like algorithm we need to load model Data into cache for predicting
the records.

Here is the example for KNN.


[image: Inline image 1]

So if the model will be a large file say1 or 2 GB we will be able to load
them into Distributed cache.

The one way is to split/partition the model Result into some files and
perform the distance calculation for all records in that file and then find
the min ditance and max occurance of classlabel and predict the outcome.

How can we parttion the file and perform the operation on these partition ?

ie  1 record <Distance> parttition1,partition2,....
     2nd record <Distance> parttition1,partition2,...

This is what came to my thought.

Is there any further way.

Any pointers would help me.

-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Mime
View raw message