Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of unmeshabiju@gmail.com
 designates 209.85.213.179 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAE422GAaiuKnQZQZh6BGiGu4nr3+hfsOQi4g=c2=pbuahy0Xjg@mail.gmail.com>
References: 
 <CACp0qUGtdyfuR-unpx6XHu3jGyd2N748tX2cuoE7YXPb+u3Epw@mail.gmail.com>
 <CAJwFCa14Sa7fxLfVKzOn4ijQjfB+WoNry=qDDpaSaQZ+x1SN+A@mail.gmail.com>
 <CACp0qUF3ebtYdE=h0x=1hhw5WsBL5UMqbRGFmF469kUFynk77g@mail.gmail.com>
 <CACp0qUGCHUvSaWZ3ibzj1aWesUVTvxCVEi1jticVCP=LoLDOTg@mail.gmail.com>
 <CAE422GDNv2vddsRVszb0JWGsVe3Tpa3-11_ynQnA3gb5Gz2apA@mail.gmail.com>
 <CACp0qUEDrgAEkeWuxEZAzKewpzLSPfZxwSZvbDgiJzuwRSJ79Q@mail.gmail.com>
 <CAE422GBi8NEqK8ZR3GnXwRFZjBGtd7Y-kVidHqomnYp8Yz86JQ@mail.gmail.com>
 <CACp0qUEUTyP30VyXNLitqb3DXW_1yp=CiOCSmqL=2ryRMToo4g@mail.gmail.com>
 <CAE422GAaiuKnQZQZh6BGiGu4nr3+hfsOQi4g=c2=pbuahy0Xjg@mail.gmail.com>
From: unmesha sreeveni <unmeshabiju@gmail.com>
Date: Wed, 21 Jan 2015 11:37:00 +0530
Message-ID: 
 <CACp0qUGAwm4MvEDL+3ubJ7XZwtadAcuJN0mHAAc7rZwyhhpasA@mail.gmail.com>
Subject: Re: How to partition a file to smaller size for performing KNN in
 hadoop mapreduce
To: User Hadoop <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=20cf301d3dfe815fa3050d235fb3

--20cf301d3dfe815fa3050d235fb3
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I have 4 nodes and the replication factor is set to 3

On Wed, Jan 21, 2015 at 11:15 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <drake.m=
in@nexr.com> wrote:

> Yes, almost same. I assume the most time spending part was copying model
> data from datanode which has model data to actual process node(tasktracke=
r
> or nodemanager).
>
> How about the model data's replication factor? How many nodes do you have=
?
> If you have 4 or more nodes, you can increase replication with following
> command. I suggest the number equal to your datanodes, but first you shou=
ld
> confirm the enough space in HDFS.
>
>
>    - hdfs dfs -setrep -w 6 /user/model/data
>
>
>
>
> Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
>
> On Wed, Jan 21, 2015 at 2:12 PM, unmesha sreeveni <unmeshabiju@gmail.com>
> wrote:
>
>> Yes I tried the same Drake.
>>
>> I dont know if I understood your answer.
>>
>>  Instead of loading them into setup() through cache I read them directly
>> from HDFS in map section. and for each incoming record .I found the
>> distance between all the records in HDFS.
>> ie if R ans S are my dataset, R is the model data stored in HDFs
>> and when S taken for processing
>> S1-R(finding distance with whole R set)
>> S2-R
>>
>> But it is taking a long time as it needs to compute the distance.
>>
>> On Wed, Jan 21, 2015 at 10:31 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <drak=
e.min@nexr.com> wrote:
>>
>>> In my suggestion, map or reduce tasks do not use distributed cache. The=
y
>>> use file directly from HDFS with short circuit local read. Like a share=
d
>>> storage method, but almost every node has the data with high-replicatio=
n
>>> factor.
>>>
>>> Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
>>>
>>> On Wed, Jan 21, 2015 at 1:49 PM, unmesha sreeveni <unmeshabiju@gmail.co=
m
>>> > wrote:
>>>
>>>> But stil if the model is very large enough, how can we load them inti
>>>> Distributed cache or some thing like that.
>>>> Here is one source :
>>>> http://www.cs.utah.edu/~lifeifei/papers/knnslides.pdf
>>>> But it is confusing me
>>>>
>>>> On Wed, Jan 21, 2015 at 7:30 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <dra=
ke.min@nexr.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> How about this ? The large model data stay in HDFS but with many
>>>>> replications and MapReduce program read the model from HDFS. In theor=
y, the
>>>>> replication factor of model data equals with number of data nodes and=
 with
>>>>> the Short Circuit Local Reads function of HDFS datanode, the map or r=
educe
>>>>> tasks read the model data in their own disks.
>>>>>
>>>>> In this way, maybe use too many usage of HDFS, but the annoying
>>>>> partition problem will be gone.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
>>>>>
>>>>> On Thu, Jan 15, 2015 at 6:05 PM, unmesha sreeveni <
>>>>> unmeshabiju@gmail.com> wrote:
>>>>>
>>>>>> Is there any way..
>>>>>> Waiting for a reply.I have posted the question every where..but none
>>>>>> is responding back.
>>>>>> I feel like this is the right place to ask doubts. As some of u may
>>>>>> came across the same issue and get stuck.
>>>>>>
>>>>>> On Thu, Jan 15, 2015 at 12:34 PM, unmesha sreeveni <
>>>>>> unmeshabiju@gmail.com> wrote:
>>>>>>
>>>>>>> Yes, One of my friend is implemeting the same. I know global sharin=
g
>>>>>>> of Data is not possible across Hadoop MapReduce. But I need to chec=
k if
>>>>>>> that can be done somehow in hadoop Mapreduce also. Because I found =
some
>>>>>>> papers in KNN hadoop also.
>>>>>>> And I trying to compare the performance too.
>>>>>>>
>>>>>>> Hope some pointers can help me.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 15, 2015 at 12:17 PM, Ted Dunning <ted.dunning@gmail.co=
m
>>>>>>> > wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> have you considered implementing using something like spark?  That
>>>>>>>> could be much easier than raw map-reduce
>>>>>>>>
>>>>>>>> On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni <
>>>>>>>> unmeshabiju@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> In KNN like algorithm we need to load model Data into cache for
>>>>>>>>> predicting the records.
>>>>>>>>>
>>>>>>>>> Here is the example for KNN.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [image: Inline image 1]
>>>>>>>>>
>>>>>>>>> So if the model will be a large file say1 or 2 GB we will be able
>>>>>>>>> to load them into Distributed cache.
>>>>>>>>>
>>>>>>>>> The one way is to split/partition the model Result into some file=
s
>>>>>>>>> and perform the distance calculation for all records in that file=
 and then
>>>>>>>>> find the min ditance and max occurance of classlabel and predict =
the
>>>>>>>>> outcome.
>>>>>>>>>
>>>>>>>>> How can we parttion the file and perform the operation on these
>>>>>>>>> partition ?
>>>>>>>>>
>>>>>>>>> ie  1 record <Distance> parttition1,partition2,....
>>>>>>>>>      2nd record <Distance> parttition1,partition2,...
>>>>>>>>>
>>>>>>>>> This is what came to my thought.
>>>>>>>>>
>>>>>>>>> Is there any further way.
>>>>>>>>>
>>>>>>>>> Any pointers would help me.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Thanks & Regards *
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Unmesha Sreeveni U.B*
>>>>>>>>> *Hadoop, Bigdata Developer*
>>>>>>>>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>>>>>>>>> http://www.unmeshasreeveni.blogspot.in/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Thanks & Regards *
>>>>>>>
>>>>>>>
>>>>>>> *Unmesha Sreeveni U.B*
>>>>>>> *Hadoop, Bigdata Developer*
>>>>>>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>>>>>>> http://www.unmeshasreeveni.blogspot.in/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards *
>>>>>>
>>>>>>
>>>>>> *Unmesha Sreeveni U.B*
>>>>>> *Hadoop, Bigdata Developer*
>>>>>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>>>>>> http://www.unmeshasreeveni.blogspot.in/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards *
>>>>
>>>>
>>>> *Unmesha Sreeveni U.B*
>>>> *Hadoop, Bigdata Developer*
>>>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>>>> http://www.unmeshasreeveni.blogspot.in/
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Thanks & Regards *
>>
>>
>> *Unmesha Sreeveni U.B*
>> *Hadoop, Bigdata Developer*
>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>


--=20
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

--20cf301d3dfe815fa3050d235fb3
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:verdana,=
sans-serif">I have 4 nodes and the replication factor is set to 3</div></di=
v><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, Jan 21,=
 2015 at 11:15 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <span dir=3D"ltr">&lt;<=
a href=3D"mailto:drake.min@nexr.com" target=3D"_blank">drake.min@nexr.com</=
a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0=
 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><di=
v>Yes, almost same. I assume the most time spending part was copying model =
data from datanode which has model data to actual process node(tasktracker =
or nodemanager).=C2=A0</div><div><br></div>How about the model data&#39;s r=
eplication factor? How many nodes do you have? If you have 4 or more nodes,=
 you can increase replication with following command. I suggest the number =
equal to your datanodes, but first you should confirm the enough space in H=
DFS.<div><br></div><div><ul style=3D"color:rgb(0,0,0);font-family:Verdana,H=
elvetica,Arial,sans-serif;font-size:13px"><li style=3D"font-size:12px;color=
:rgb(51,51,51)"><tt>hdfs dfs -setrep -w 6 /user/model/data</tt></li></ul><d=
iv><br></div></div><div><br></div></div><div class=3D"gmail_extra"><br clea=
r=3D"all"><div><div><div dir=3D"ltr">Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D=
</div></div></div><div><div class=3D"h5">
<br><div class=3D"gmail_quote">On Wed, Jan 21, 2015 at 2:12 PM, unmesha sre=
eveni <span dir=3D"ltr">&lt;<a href=3D"mailto:unmeshabiju@gmail.com" target=
=3D"_blank">unmeshabiju@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-=
family:verdana,sans-serif">Yes I tried the same Drake.</div><div class=3D"g=
mail_default" style=3D"font-family:verdana,sans-serif"><br></div><div class=
=3D"gmail_default" style=3D"font-family:verdana,sans-serif">I dont know if =
I understood your answer.</div><div class=3D"gmail_default" style=3D"font-f=
amily:verdana,sans-serif"><br></div><div class=3D"gmail_default" style=3D"f=
ont-family:verdana,sans-serif">=C2=A0Instead of loading them into setup() t=
hrough cache I read them directly from HDFS in map section. and for each in=
coming record .I found the distance between all the records in HDFS.</div><=
div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif">ie if =
R ans S are my dataset, R is the model data stored in HDFs</div><div class=
=3D"gmail_default" style=3D"font-family:verdana,sans-serif">and when S take=
n for processing</div><div class=3D"gmail_default" style=3D"font-family:ver=
dana,sans-serif">S1-R(finding distance with whole R set)</div><div class=3D=
"gmail_default" style=3D"font-family:verdana,sans-serif">S2-R</div><div cla=
ss=3D"gmail_default" style=3D"font-family:verdana,sans-serif"><br></div><di=
v class=3D"gmail_default" style=3D"font-family:verdana,sans-serif">But it i=
s taking a long time as it needs to compute the distance.</div></div><div><=
div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, Jan 2=
1, 2015 at 10:31 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <span dir=3D"ltr">&lt=
;<a href=3D"mailto:drake.min@nexr.com" target=3D"_blank">drake.min@nexr.com=
</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">I=
n my suggestion, map or reduce tasks do not use distributed cache. They use=
 file directly from HDFS with short circuit local read. Like a shared stora=
ge method, but almost every node has the data with high-replication factor.=
</div><div class=3D"gmail_extra"><br clear=3D"all"><div><div><div dir=3D"lt=
r">Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D</div></div></div><div><div>
<br><div class=3D"gmail_quote">On Wed, Jan 21, 2015 at 1:49 PM, unmesha sre=
eveni <span dir=3D"ltr">&lt;<a href=3D"mailto:unmeshabiju@gmail.com" target=
=3D"_blank">unmeshabiju@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-=
family:verdana,sans-serif">But stil if the model is very large enough, how =
can we load them inti Distributed cache or some thing like that.</div><div =
class=3D"gmail_default"><font face=3D"verdana, sans-serif">Here is one sour=
ce : <a href=3D"http://www.cs.utah.edu/~lifeifei/papers/knnslides.pdf" targ=
et=3D"_blank">http://www.cs.utah.edu/~lifeifei/papers/knnslides.pdf</a></fo=
nt><br></div><div class=3D"gmail_default"><font face=3D"verdana, sans-serif=
">But it is confusing me</font></div></div><div><div><div class=3D"gmail_ex=
tra"><br><div class=3D"gmail_quote">On Wed, Jan 21, 2015 at 7:30 AM, Drake=
=EB=AF=BC=EC=98=81=EA=B7=BC <span dir=3D"ltr">&lt;<a href=3D"mailto:drake.m=
in@nexr.com" target=3D"_blank">drake.min@nexr.com</a>&gt;</span> wrote:<br>=
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi,=C2=A0<div><br></div><di=
v>How about this ? The large model data stay in HDFS but with many replicat=
ions and MapReduce program read the model from HDFS. In theory, the replica=
tion factor of model data equals with number of data nodes and with the=C2=
=A0Short Circuit Local Reads function of HDFS datanode, the map or reduce t=
asks read the model data in their own disks.=C2=A0</div><div><br></div><div=
>In this way, maybe use too many usage of HDFS, but the annoying partition =
problem will be gone.</div><div><br></div><div>Thanks</div></div><div class=
=3D"gmail_extra"><br clear=3D"all"><div><div><div dir=3D"ltr">Drake =EB=AF=
=BC=EC=98=81=EA=B7=BC Ph.D</div></div></div><div><div>
<br><div class=3D"gmail_quote">On Thu, Jan 15, 2015 at 6:05 PM, unmesha sre=
eveni <span dir=3D"ltr">&lt;<a href=3D"mailto:unmeshabiju@gmail.com" target=
=3D"_blank">unmeshabiju@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-=
family:verdana,sans-serif">Is there any way..</div><div class=3D"gmail_defa=
ult" style=3D"font-family:verdana,sans-serif">Waiting for a reply.I have po=
sted the question every where..but none is responding back.</div><div class=
=3D"gmail_default" style=3D"font-family:verdana,sans-serif">I feel like thi=
s is the right place to ask doubts. As some of u may came across the same i=
ssue and get stuck.</div></div><div><div><div class=3D"gmail_extra"><br><di=
v class=3D"gmail_quote">On Thu, Jan 15, 2015 at 12:34 PM, unmesha sreeveni =
<span dir=3D"ltr">&lt;<a href=3D"mailto:unmeshabiju@gmail.com" target=3D"_b=
lank">unmeshabiju@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family=
:verdana,sans-serif">Yes, One of my friend is implemeting the same. I know =
global sharing of Data is not possible across Hadoop MapReduce. But I need =
to check if that can be done somehow in hadoop Mapreduce also. Because I fo=
und some papers in KNN hadoop also.</div><div class=3D"gmail_default" style=
=3D"font-family:verdana,sans-serif">And I trying to compare the performance=
 too.</div><div class=3D"gmail_default" style=3D"font-family:verdana,sans-s=
erif"><br></div><div class=3D"gmail_default" style=3D"font-family:verdana,s=
ans-serif">Hope some pointers can help me.</div><div><div><div class=3D"gma=
il_default" style=3D"font-family:verdana,sans-serif"><br></div><div class=
=3D"gmail_extra"><br><div class=3D"gmail_quote">On Thu, Jan 15, 2015 at 12:=
17 PM, Ted Dunning <span dir=3D"ltr">&lt;<a href=3D"mailto:ted.dunning@gmai=
l.com" target=3D"_blank">ted.dunning@gmail.com</a>&gt;</span> wrote:<br><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padd=
ing-left:1ex"><div dir=3D"ltr"><br><div>have you considered implementing us=
ing something like spark?=C2=A0 That could be much easier than raw map-redu=
ce</div></div><div><div><div class=3D"gmail_extra"><br><div class=3D"gmail_=
quote">On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:unmeshabiju@gmail.com" target=3D"_blank">unmeshabiju=
@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(20=
4,204,204);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr"><div =
class=3D"gmail_default" style=3D"font-family:verdana,sans-serif"><p style=
=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px=
;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,&#39=
;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.80480=
0033569336px;background-image:initial;background-repeat:initial">In KNN lik=
e algorithm we need to load model Data into cache for predicting the record=
s.</p><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63=
636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-fam=
ily:Arial,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-h=
eight:17.804800033569336px;background-image:initial;background-repeat:initi=
al">Here is the example for KNN.</p><p style=3D"margin:0px 0px 1em;padding:=
0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:=
both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,&#39;Deja=
Vu Sans&#39;,sans-serif;line-height:17.804800033569336px;background-image:i=
nitial;background-repeat:initial"><br></p><p style=3D"margin:0px 0px 1em;pa=
dding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;=
clear:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,=
9;DejaVu Sans&#39;,sans-serif;line-height:17.804800033569336px;background-i=
mage:initial;background-repeat:initial"><img alt=3D"Inline image 1" width=
=3D"506" height=3D"209"><br></p><div><p style=3D"margin:0px 0px 1em;padding=
:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear=
:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,&#39;Dej=
aVu Sans&#39;,sans-serif;line-height:17.804800033569336px;background-image:=
initial;background-repeat:initial">So if the model will be a large file say=
1 or 2 GB we will be able to load them into Distributed cache.</p><p style=
=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px=
;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,&#39=
;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.80480=
0033569336px;background-image:initial;background-repeat:initial">The one wa=
y is to split/partition the model Result into some files and perform the di=
stance calculation for all records in that file and then find the min ditan=
ce and max occurance of classlabel and predict the outcome.</p><p style=3D"=
margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;ver=
tical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,&#39;Lib=
eration Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.804800033=
569336px;background-image:initial;background-repeat:initial">How can we par=
ttion the file and perform the operation on these partition ?</p><pre style=
=3D"margin-top:0px;margin-bottom:10px;padding:5px;border:0px;font-size:13.6=
3636302947998px;vertical-align:baseline;font-family:Consolas,Menlo,Monaco,&=
#39;Lucida Console&#39;,&#39;Liberation Mono&#39;,&#39;DejaVu Sans Mono&#39=
;,&#39;Bitstream Vera Sans Mono&#39;,&#39;Courier New&#39;,monospace,serif;=
overflow:auto;width:auto;max-height:600px;word-wrap:normal;color:rgb(0,0,0)=
;line-height:17.804800033569336px;background:rgb(238,238,238)"><code style=
=3D"margin:0px;padding:0px;border:0px;font-size:13.63636302947998px;vertica=
l-align:baseline;font-family:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;=
,&#39;Liberation Mono&#39;,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera S=
ans Mono&#39;,&#39;Courier New&#39;,monospace,serif;white-space:inherit;bac=
kground-image:initial;background-repeat:initial">ie  1 record &lt;Distance&=
gt; parttition1,partition2,....
     2nd record &lt;Distance&gt; parttition1,partition2,...
</code></pre><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-siz=
e:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);f=
ont-family:Arial,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif=
;line-height:17.804800033569336px;background-image:initial;background-repea=
t:initial">This is what came to my thought.</p><p style=3D"margin:0px 0px 1=
em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:base=
line;clear:both;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39=
;,&#39;DejaVu Sans&#39;,sans-serif;line-height:17.804800033569336px;backgro=
und-image:initial;background-repeat:initial">Is there any further way.</p><=
p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:13.636363029=
47998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Ari=
al,&#39;Liberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:1=
7.804800033569336px;background-image:initial;background-repeat:initial">Any=
 pointers would help me.</p></div></div><span><font color=3D"#888888"><div>=
<br></div>-- <br><div><div dir=3D"ltr"><div><div dir=3D"ltr"><b><font color=
=3D"#3d85c6"><i>Thanks &amp; Regards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div><b><font color=3D"#3d85c6">Centre for Cyber Security | Amrita Vishwa V=
idyapeetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=
=3D"http://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.u=
nmeshasreeveni.blogspot.in/</a><br></div><div style=3D"color:rgb(102,0,0)">=
<br></div><i><span><br></span></i></div></div></div></div>
</font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div><div dir=3D"ltr"><div><div dir=3D"ltr"><b><font color=3D"#3d85c6"><i>T=
hanks &amp; Regards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div><b><font color=3D"#3d85c6">Centre for Cyber Security | Amrita Vishwa V=
idyapeetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=
=3D"http://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.u=
nmeshasreeveni.blogspot.in/</a><br></div><div style=3D"color:rgb(102,0,0)">=
<br></div><i><span><br></span></i></div></div></div></div>
</div></div></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div><div di=
r=3D"ltr"><div><div dir=3D"ltr"><b><font color=3D"#3d85c6"><i>Thanks &amp; =
Regards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div><b><font color=3D"#3d85c6">Centre for Cyber Security | Amrita Vishwa V=
idyapeetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=
=3D"http://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.u=
nmeshasreeveni.blogspot.in/</a><br></div><div style=3D"color:rgb(102,0,0)">=
<br></div><i><span><br></span></i></div></div></div></div>
</div>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div><div di=
r=3D"ltr"><div><div dir=3D"ltr"><b><font color=3D"#3d85c6"><i>Thanks &amp; =
Regards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div><b><font color=3D"#3d85c6">Centre for Cyber Security | Amrita Vishwa V=
idyapeetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=
=3D"http://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.u=
nmeshasreeveni.blogspot.in/</a><br></div><div style=3D"color:rgb(102,0,0)">=
<br></div><i><span><br></span></i></div></div></div></div>
</div>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div><div di=
r=3D"ltr"><div><div dir=3D"ltr"><b><font color=3D"#3d85c6"><i>Thanks &amp; =
Regards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div><b><font color=3D"#3d85c6">Centre for Cyber Security | Amrita Vishwa V=
idyapeetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=
=3D"http://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.u=
nmeshasreeveni.blogspot.in/</a><br></div><div style=3D"color:rgb(102,0,0)">=
<br></div><i><span><br></span></i></div></div></div></div>
</div>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div class=
=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr"><b><font color=
=3D"#3d85c6"><i>Thanks &amp; Regards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div><b><font color=3D"#3d85c6">Centre for Cyber Security | Amrita Vishwa V=
idyapeetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=
=3D"http://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.u=
nmeshasreeveni.blogspot.in/</a><br></div><div style=3D"color:rgb(102,0,0)">=
<br></div><i><span><br></span></i></div></div></div></div>
</div>

--20cf301d3dfe815fa3050d235fb3--