hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: How to partition a file to smaller size for performing KNN in hadoop mapreduce
Date Thu, 15 Jan 2015 06:47:56 GMT
have you considered implementing using something like spark?  That could be
much easier than raw map-reduce

On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni <unmeshabiju@gmail.com>
wrote:

> In KNN like algorithm we need to load model Data into cache for predicting
> the records.
>
> Here is the example for KNN.
>
>
> [image: Inline image 1]
>
> So if the model will be a large file say1 or 2 GB we will be able to load
> them into Distributed cache.
>
> The one way is to split/partition the model Result into some files and
> perform the distance calculation for all records in that file and then find
> the min ditance and max occurance of classlabel and predict the outcome.
>
> How can we parttion the file and perform the operation on these partition ?
>
> ie  1 record <Distance> parttition1,partition2,....
>      2nd record <Distance> parttition1,partition2,...
>
> This is what came to my thought.
>
> Is there any further way.
>
> Any pointers would help me.
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

Mime
View raw message