Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of unmeshabiju@gmail.com
 designates 209.85.223.174 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAJwFCa14Sa7fxLfVKzOn4ijQjfB+WoNry=qDDpaSaQZ+x1SN+A@mail.gmail.com>
References: 
 <CACp0qUGtdyfuR-unpx6XHu3jGyd2N748tX2cuoE7YXPb+u3Epw@mail.gmail.com>
 <CAJwFCa14Sa7fxLfVKzOn4ijQjfB+WoNry=qDDpaSaQZ+x1SN+A@mail.gmail.com>
From: unmesha sreeveni <unmeshabiju@gmail.com>
Date: Thu, 15 Jan 2015 12:34:10 +0530
Message-ID: 
 <CACp0qUF3ebtYdE=h0x=1hhw5WsBL5UMqbRGFmF469kUFynk77g@mail.gmail.com>
Subject: Re: How to partition a file to smaller size for performing KNN in
 hadoop mapreduce
To: User Hadoop <user@hadoop.apache.org>
Cc: "user@mahout.apache.org" <user@mahout.apache.org>
Content-Type: multipart/alternative; boundary=001a1140fd50e27e31050cab7819

--001a1140fd50e27e31050cab7819
Content-Type: text/plain; charset=UTF-8

Yes, One of my friend is implemeting the same. I know global sharing of
Data is not possible across Hadoop MapReduce. But I need to check if that
can be done somehow in hadoop Mapreduce also. Because I found some papers
in KNN hadoop also.
And I trying to compare the performance too.

Hope some pointers can help me.


On Thu, Jan 15, 2015 at 12:17 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

>
> have you considered implementing using something like spark?  That could
> be much easier than raw map-reduce
>
> On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni <unmeshabiju@gmail.com>
> wrote:
>
>> In KNN like algorithm we need to load model Data into cache for
>> predicting the records.
>>
>> Here is the example for KNN.
>>
>>
>> [image: Inline image 1]
>>
>> So if the model will be a large file say1 or 2 GB we will be able to load
>> them into Distributed cache.
>>
>> The one way is to split/partition the model Result into some files and
>> perform the distance calculation for all records in that file and then find
>> the min ditance and max occurance of classlabel and predict the outcome.
>>
>> How can we parttion the file and perform the operation on these partition
>> ?
>>
>> ie  1 record <Distance> parttition1,partition2,....
>>      2nd record <Distance> parttition1,partition2,...
>>
>> This is what came to my thought.
>>
>> Is there any further way.
>>
>> Any pointers would help me.
>>
>> --
>> *Thanks & Regards *
>>
>>
>> *Unmesha Sreeveni U.B*
>> *Hadoop, Bigdata Developer*
>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

--001a1140fd50e27e31050cab7819
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:verdana,=
sans-serif">Yes, One of my friend is implemeting the same. I know global sh=
aring of Data is not possible across Hadoop MapReduce. But I need to check =
if that can be done somehow in hadoop Mapreduce also. Because I found some =
papers in KNN hadoop also.</div><div class=3D"gmail_default" style=3D"font-=
family:verdana,sans-serif">And I trying to compare the performance too.</di=
v><div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif"><br=
></div><div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif=
">Hope some pointers can help me.</div><div class=3D"gmail_default" style=
=3D"font-family:verdana,sans-serif"><br></div><div class=3D"gmail_extra"><b=
r><div class=3D"gmail_quote">On Thu, Jan 15, 2015 at 12:17 PM, Ted Dunning =
<span dir=3D"ltr">&lt;<a href=3D"mailto:ted.dunning@gmail.com" target=3D"_b=
lank">ted.dunning@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"g=
mail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-=
left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div =
dir=3D"ltr"><br><div>have you considered implementing using something like =
spark?=C2=A0 That could be much easier than raw map-reduce</div></div><div>=
<div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, Jan =
14, 2015 at 10:06 PM, unmesha sreeveni <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:unmeshabiju@gmail.com" target=3D"_blank">unmeshabiju@gmail.com</a>&gt;<=
/span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-l=
eft-style:solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_defa=
ult" style=3D"font-family:verdana,sans-serif"><p style=3D"margin:0px 0px 1e=
m;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:basel=
ine;clear:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;=
,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.804800033569336px;backgrou=
nd-image:initial;background-repeat:initial">In KNN like algorithm we need t=
o load model Data into cache for predicting the records.</p><p style=3D"mar=
gin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertic=
al-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,&#39;Libera=
tion Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.804800033569=
336px;background-image:initial;background-repeat:initial">Here is the examp=
le for KNN.</p><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-s=
ize:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0)=
;font-family:Arial,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-ser=
if;line-height:17.804800033569336px;background-image:initial;background-rep=
eat:initial"><br></p><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;=
font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(=
0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sa=
ns-serif;line-height:17.804800033569336px;background-image:initial;backgrou=
nd-repeat:initial"><img alt=3D"Inline image 1" width=3D"506" height=3D"209"=
><br></p><div><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-si=
ze:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);=
font-family:Arial,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-seri=
f;line-height:17.804800033569336px;background-image:initial;background-repe=
at:initial">So if the model will be a large file say1 or 2 GB we will be ab=
le to load them into Distributed cache.</p><p style=3D"margin:0px 0px 1em;p=
adding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline=
;clear:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,&#=
39;DejaVu Sans&#39;,sans-serif;line-height:17.804800033569336px;background-=
image:initial;background-repeat:initial">The one way is to split/partition =
the model Result into some files and perform the distance calculation for a=
ll records in that file and then find the min ditance and max occurance of =
classlabel and predict the outcome.</p><p style=3D"margin:0px 0px 1em;paddi=
ng:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;cle=
ar:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,&#39;D=
ejaVu Sans&#39;,sans-serif;line-height:17.804800033569336px;background-imag=
e:initial;background-repeat:initial">How can we parttion the file and perfo=
rm the operation on these partition ?</p><pre style=3D"margin-top:0px;margi=
n-bottom:10px;padding:5px;border:0px;font-size:13.63636302947998px;vertical=
-align:baseline;font-family:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,=
&#39;Liberation Mono&#39;,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sa=
ns Mono&#39;,&#39;Courier New&#39;,monospace,serif;overflow:auto;width:auto=
;max-height:600px;word-wrap:normal;color:rgb(0,0,0);line-height:17.80480003=
3569336px;background:rgb(238,238,238)"><code style=3D"margin:0px;padding:0p=
x;border:0px;font-size:13.63636302947998px;vertical-align:baseline;font-fam=
ily:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberation Mono&#39=
;,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,&#39;Courie=
r New&#39;,monospace,serif;white-space:inherit;background-image:initial;bac=
kground-repeat:initial">ie  1 record &lt;Distance&gt; parttition1,partition=
2,....
     2nd record &lt;Distance&gt; parttition1,partition2,...
</code></pre><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-siz=
e:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);f=
ont-family:Arial,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif=
;line-height:17.804800033569336px;background-image:initial;background-repea=
t:initial">This is what came to my thought.</p><p style=3D"margin:0px 0px 1=
em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:base=
line;clear:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39=
;,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.804800033569336px;backgro=
und-image:initial;background-repeat:initial">Is there any further way.</p><=
p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:13.636363029=
47998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Ari=
al,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:1=
7.804800033569336px;background-image:initial;background-repeat:initial">Any=
 pointers would help me.</p></div></div><span><font color=3D"#888888"><div>=
<br></div>-- <br><div><div dir=3D"ltr"><div><div dir=3D"ltr"><b><font color=
=3D"#3d85c6"><i>Thanks &amp; Regards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div><b><font color=3D"#3d85c6">Centre for Cyber Security | Amrita Vishwa V=
idyapeetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=
=3D"http://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.u=
nmeshasreeveni.blogspot.in/</a><br></div><div style=3D"color:rgb(102,0,0)">=
<br></div><i><span><br></span></i></div></div></div></div>
</font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div><div dir=3D"ltr"><div><div dir=3D"ltr"><b><font color=3D"#3d85c6"><i>T=
hanks &amp; Regards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div><b><font color=3D"#3d85c6">Centre for Cyber Security | Amrita Vishwa V=
idyapeetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=
=3D"http://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.u=
nmeshasreeveni.blogspot.in/</a><br></div><div style=3D"color:rgb(102,0,0)">=
<br></div><i><span><br></span></i></div></div></div></div>
</div></div>

--001a1140fd50e27e31050cab7819--