Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which
 is an MX secondary for user@hive.apache.org)
Date: Mon, 20 Apr 2015 17:41:22 +0000 (UTC)
From: Sanjay Subramanian <sanjaysubramanian@yahoo.com>
Reply-To: Sanjay Subramanian <sanjaysubramanian@yahoo.com>
To: "user@hive.apache.org" <user@hive.apache.org>
Message-ID: <1444032354.231374.1429551682665.JavaMail.yahoo@mail.yahoo.com>
In-Reply-To: <553525B1.7050104@gmail.com>
References: <553525B1.7050104@gmail.com>
Subject: Using Hive as a file comparison and grep-ping tool
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_231372_344757659.1429551682654"

------=_Part_231372_344757659.1429551682654
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

hey guys
As data wranglers and programmers we often need quick tools. One such tool =
I need almost everyday is one that greps a file based on contents of anothe=
r file. One can write this in perl, python but since I am already using had=
oop ecosystem extensively, I said why not do this in Hive ?=C2=A0
Perhaps you guys already know this and have better solutions....nevertheles=
s :-) here goes...

Best regards
sanjay(Hive=C2=A0super-fan)
I just posted this on my bloghttps://bigdatalatte.wordpress.com/2015/04/20/=
using-hive-as-a-file-comparison-and-grep-ping-tool/
In case the blog URL does not work for any reason, here is the logic
Using Hive as a file comparison and grep-ping tool=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D1. Logon to your linux t=
erminal where u run Hive queries from
2. Create a database called "myutils" in Hive=C2=A0 =C2=A0Create two hive t=
ables myutils.file1 and myutils.file2 in Hive=C2=A0 =C2=A0 - each of these =
tables will have a partition called "fn" =C2=A0 =C2=A0 ----> fn is short fo=
r "filename"=C2=A0 =C2=A0 - each of these tables will have just one column =
called "ln" ----> ln is short for "line"=C2=A0 =C2=A0An easy script to help=
 do that would be as follows=C2=A0=C2=A0 =C2=A0 for r in 1 2 ; do hive -e "=
CREATE DATABASE IF NOT EXISTS myutils; USE myutils; DROP TABLE IF EXISTS fi=
le${r}; CREATE EXTERNAL TABLE IF NOT EXISTS file${r}(ln STRING) PARTITIONED=
 BY (fn STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';"; done
3. Create a permanent base location folder in HDFS=C2=A0 =C2=A0hdfs dfs -mk=
dir -p /workspace/myutils/filecomparator/file1/=C2=A0 =C2=A0hdfs dfs -mkdir=
 -p /workspace/myutils/filecomparator/file2/=C2=A0 =C2=A0USECASE 1 :=C2=A0=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3DSearch if a bunch of IP addresses exist in=
 another file containing (larger) bunch of IPs
[1] registeredIPs.txt=C2=A0 =C2=A0 10.456.34.90=C2=A0 =C2=A0 123.675.654.1=
=C2=A0 =C2=A0 21.87.657.456=C2=A0 =C2=A0 234.109.34.234=C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0 visitorIPs.txt=C2=A0 =C2=A0 10.456.34.90=C2=A0 =C2=A0 12.367.=
54.23=C2=A0 =C2=A0 218.7.657.456=C2=A0 =C2=A0 23.4.109.3=C2=A0 =C2=A0[2] Ou=
tput which IPs in File1 are present in File2
[3] Put each file in a separate HDFS location=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=
=A0 hdfs dfs -mkdir -p /workspace/myutils/filecomparator/file1/registeredIP=
s.txt=C2=A0 =C2=A0 hdfs dfs -put VisitorIPs.txt =C2=A0/workspace/myutils/fi=
lecomparator/file1/visitorIPs.txt
=C2=A0 =C2=A0 hdfs dfs -put registeredIPs.txt =C2=A0/workspace/myutils/file=
comparator/file1/registeredIPs.txt=C2=A0 =C2=A0 hdfs dfs -put visitorIPs.tx=
t =C2=A0/workspace/myutils/filecomparator/file1/visitorIPs.txt
[4] Add partition to =C2=A0myutils.file1=C2=A0 =C2=A0 For simplicity keep t=
he partition names identical to the file names themselves=C2=A0=C2=A0=C2=A0=
 =C2=A0 hive -e "USE myutils; ALTER TABLE file1 ADD PARTITION(ln=3D'registe=
redIPs.txt') LOCATION '/workspace/myutils/filecomparator/file1/registeredIP=
s.txt'"
=C2=A0 =C2=A0 hive -e "USE myutils; ALTER TABLE file2 ADD PARTITION(ln=3D'v=
isitorIPs.txt') LOCATION '/workspace/myutils/filecomparator/file2/visitorIP=
s.txt'"=C2=A0 =C2=A0=C2=A0[5] Check that partitions can be accesd by Hive
=C2=A0 =C2=A0 # This should give u the same answer as=C2=A0 =C2=A0 # wc -l =
registeredIPs.txt=C2=A0 =C2=A0 hive -e "select count(*) from myutils.file1 =
where fn=3D'registeredIPs.txt'"
=C2=A0 =C2=A0 # This should give u the same answer as=C2=A0 =C2=A0 # wc -l =
visitorIPs.txt=C2=A0 =C2=A0 hive -e "select count(*) from myutils.file2 whe=
re fn=3D'visitorIPs.txt'"
[6] Count the number of IPs in registeredIPs.txt that are in visitorIPs.txt
# This dumps to a local file systemhive -e "SELECT f1.ln FROM (SELECT ln FR=
OM utils.file1 WHERE fn=3D'registeredIPs.txt') f1 =C2=A0JOIN (select ln fro=
m myutils.file2 WHERE fn=3D'visitorIPs.txt') f2 =C2=A0ON trim(f1.ln) =3D tr=
im(f2.ln)" > ./registered_in_visitors_list.txt
# This dumps to a new "internally-managed-by-hive" table=C2=A0# Make sure u=
 already dont have some valuable hive table called "myutils.registered_in_v=
isitors_list" - else this will overwrite that hive table with the results o=
f this hive query=C2=A0hive -e "USE myutils; DROP TABLE IF EXITS registered=
_in_visitors_list; CREATE TABLE if not exists registered_in_visitors_list A=
S SELECT f1.ln FROM (select ln FROM utils.file1 WHERE fn=3D'registeredIPs.t=
xt') f1 =C2=A0JOIN (SELECT ln FROM myutils.file2 WHERE fn=3D'visitorIPs.txt=
') f2 =C2=A0ON trim(f1.ln) =3D trim(f2.ln)"
# This dumps to a directory on HDFS# Make sure u already dont have some val=
uable directory called "registered_in_visitors_list" - else this will overw=
rite that director and all its contents with the results of this hive query=
=C2=A0hive -e "INSERT OVERWRITE DIRECTORY '/workspace/myutils/filecomparato=
r/registered_in_visitors_list' SELECT f1.ln FROM (select ln FROM utils.file=
1 WHERE fn=3D'registeredIPs.txt') f1 =C2=A0JOIN (SELECT ln FROM myutils.fil=
e2 WHERE fn=3D'visitorIPs.txt') f2 =C2=A0ON trim(f1.ln) =3D trim(f2.ln)"


=C2=A0
 =C2=A0
------=_Part_231372_344757659.1429551682654
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"color:#000; background-color:#fff; font-family:Co=
urier New, courier, monaco, monospace, sans-serif;font-size:24px"><div id=
=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr">hey guys</div><div id=3D"y=
ui_3_16_0_1_1429551324693_3396" dir=3D"ltr"><br></div><div id=3D"yui_3_16_0=
_1_1429551324693_3396" dir=3D"ltr">As data wranglers and programmers we oft=
en need quick tools. One such tool I need almost everyday is one that greps=
 a file based on contents of another file. One can write this in perl, pyth=
on but since I am already using hadoop ecosystem extensively, I said why no=
t do this in Hive ?&nbsp;</div><div id=3D"yui_3_16_0_1_1429551324693_3396" =
dir=3D"ltr"><br></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"lt=
r">Perhaps you guys already know this and have better solutions....neverthe=
less :-) here goes...<br></div><div id=3D"yui_3_16_0_1_1429551324693_3396" =
dir=3D"ltr"><br></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"lt=
r">Best regards</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr=
"><br></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr">sanjay<=
/div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr">(Hive&nbsp;sup=
er-fan)</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr"><br></=
div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr">I just posted t=
his on my blog</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr"=
><a href=3D"https://bigdatalatte.wordpress.com/2015/04/20/using-hive-as-a-f=
ile-comparison-and-grep-ping-tool/" id=3D"yui_3_16_0_1_1429551324693_3446">=
https://bigdatalatte.wordpress.com/2015/04/20/using-hive-as-a-file-comparis=
on-and-grep-ping-tool/</a></div><div id=3D"yui_3_16_0_1_1429551324693_3396"=
 dir=3D"ltr"><br></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"l=
tr">In case the blog URL does not work for any reason, here is the logic</d=
iv><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr"><br></div><div i=
d=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">Usi=
ng Hive as a file comparison and grep-ping tool</div><div id=3D"yui_3_16_0_=
1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</div><div id=3D"y=
ui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">1. Logon =
to your linux terminal where u run Hive queries from</div><div id=3D"yui_3_=
16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""><br class=3D""=
 style=3D""></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" c=
lass=3D"" style=3D"">2. Create a database called "myutils" in Hive</div><di=
v id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">=
&nbsp; &nbsp;Create two hive tables myutils.file1 and myutils.file2 in Hive=
</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" st=
yle=3D"">&nbsp; &nbsp; - each of these tables will have a partition called =
"fn" &nbsp; &nbsp; ----&gt; fn is short for "filename"</div><div id=3D"yui_=
3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp=
; - each of these tables will have just one column called "ln" ----&gt; ln =
is short for "line"</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D=
"ltr" class=3D"" style=3D"">&nbsp; &nbsp;An easy script to help do that wou=
ld be as follows&nbsp;</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=
=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; for r in 1 2 ; do hive -e "CRE=
ATE DATABASE IF NOT EXISTS myutils; USE myutils; DROP TABLE IF EXISTS file$=
{r}; CREATE EXTERNAL TABLE IF NOT EXISTS file${r}(ln STRING) PARTITIONED BY=
 (fn STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';"; done</div><d=
iv id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""=
><br class=3D"" style=3D""></div><div id=3D"yui_3_16_0_1_1429551324693_3396=
" dir=3D"ltr" class=3D"" style=3D"">3. Create a permanent base location fol=
der in HDFS</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" cl=
ass=3D"" style=3D"">&nbsp; &nbsp;hdfs dfs -mkdir -p /workspace/myutils/file=
comparator/file1/</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"l=
tr" class=3D"" style=3D"">&nbsp; &nbsp;hdfs dfs -mkdir -p /workspace/myutil=
s/filecomparator/file2/</div><div id=3D"yui_3_16_0_1_1429551324693_3396" di=
r=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp;</div><div id=3D"yui_3_16_0_1_=
1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">USECASE 1 :&nbsp;</di=
v><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=
=3D"">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</div><div id=3D"yui_3_16_0_1_142955=
1324693_3396" dir=3D"ltr" class=3D"" style=3D"">Search if a bunch of IP add=
resses exist in another file containing (larger) bunch of IPs</div><div id=
=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""><br =
class=3D"" style=3D""></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=
=3D"ltr" class=3D"" style=3D"">[1] registeredIPs.txt</div><div id=3D"yui_3_=
16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; =
10.456.34.90</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" c=
lass=3D"" style=3D"">&nbsp; &nbsp; 123.675.654.1</div><div id=3D"yui_3_16_0=
_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; 21.8=
7.657.456</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" clas=
s=3D"" style=3D"">&nbsp; &nbsp; 234.109.34.234</div><div id=3D"yui_3_16_0_1=
_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp;&nbsp;<=
/div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" sty=
le=3D"">&nbsp; &nbsp; visitorIPs.txt</div><div id=3D"yui_3_16_0_1_142955132=
4693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; 10.456.34.90</di=
v><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=
=3D"">&nbsp; &nbsp; 12.367.54.23</div><div id=3D"yui_3_16_0_1_1429551324693=
_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; 218.7.657.456</div><=
div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"=
">&nbsp; &nbsp; 23.4.109.3</div><div id=3D"yui_3_16_0_1_1429551324693_3396"=
 dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp;</div><div id=3D"yui_3_16_0=
_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">[2] Output which I=
Ps in File1 are present in File2</div><div id=3D"yui_3_16_0_1_1429551324693=
_3396" dir=3D"ltr" class=3D"" style=3D""><br class=3D"" style=3D""></div><d=
iv id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""=
>[3] Put each file in a separate HDFS location</div><div id=3D"yui_3_16_0_1=
_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp;&nbsp;<=
/div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" sty=
le=3D"">&nbsp; &nbsp; hdfs dfs -mkdir -p /workspace/myutils/filecomparator/=
file1/registeredIPs.txt</div><div id=3D"yui_3_16_0_1_1429551324693_3396" di=
r=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; hdfs dfs -put VisitorIPs.txt =
&nbsp;/workspace/myutils/filecomparator/file1/visitorIPs.txt</div><div id=
=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""><br =
class=3D"" style=3D""></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=
=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; hdfs dfs -put registeredIPs.tx=
t &nbsp;/workspace/myutils/filecomparator/file1/registeredIPs.txt</div><div=
 id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&=
nbsp; &nbsp; hdfs dfs -put visitorIPs.txt &nbsp;/workspace/myutils/filecomp=
arator/file1/visitorIPs.txt</div><div id=3D"yui_3_16_0_1_1429551324693_3396=
" dir=3D"ltr" class=3D"" style=3D""><br class=3D"" style=3D""></div><div id=
=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">[4] =
Add partition to &nbsp;myutils.file1</div><div id=3D"yui_3_16_0_1_142955132=
4693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; For simplicity k=
eep the partition names identical to the file names themselves</div><div id=
=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbs=
p;&nbsp;</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=
=3D"" style=3D"">&nbsp; &nbsp; hive -e "USE myutils; ALTER TABLE file1 ADD =
PARTITION(ln=3D'registeredIPs.txt') LOCATION '/workspace/myutils/filecompar=
ator/file1/registeredIPs.txt'"</div><div id=3D"yui_3_16_0_1_1429551324693_3=
396" dir=3D"ltr" class=3D"" style=3D""><br class=3D"" style=3D""></div><div=
 id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&=
nbsp; &nbsp; hive -e "USE myutils; ALTER TABLE file2 ADD PARTITION(ln=3D'vi=
sitorIPs.txt') LOCATION '/workspace/myutils/filecomparator/file2/visitorIPs=
.txt'"</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=
=3D"" style=3D"">&nbsp; &nbsp;&nbsp;</div><div id=3D"yui_3_16_0_1_142955132=
4693_3396" dir=3D"ltr" class=3D"" style=3D"">[5] Check that partitions can =
be accesd by Hive</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"l=
tr" class=3D"" style=3D""><br class=3D"" style=3D""></div><div id=3D"yui_3_=
16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; =
# This should give u the same answer as</div><div id=3D"yui_3_16_0_1_142955=
1324693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; # wc -l regis=
teredIPs.txt</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" c=
lass=3D"" style=3D"">&nbsp; &nbsp; hive -e "select count(*) from myutils.fi=
le1 where fn=3D'registeredIPs.txt'"</div><div id=3D"yui_3_16_0_1_1429551324=
693_3396" dir=3D"ltr" class=3D"" style=3D""><br class=3D"" style=3D""></div=
><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=
=3D"">&nbsp; &nbsp; # This should give u the same answer as</div><div id=3D=
"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">&nbsp; =
&nbsp; # wc -l visitorIPs.txt</div><div id=3D"yui_3_16_0_1_1429551324693_33=
96" dir=3D"ltr" class=3D"" style=3D"">&nbsp; &nbsp; hive -e "select count(*=
) from myutils.file2 where fn=3D'visitorIPs.txt'"</div><div id=3D"yui_3_16_=
0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""><br class=3D"" st=
yle=3D""></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" clas=
s=3D"" style=3D"">[6] Count the number of IPs in registeredIPs.txt that are=
 in visitorIPs.txt</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"=
ltr" class=3D"" style=3D""><br class=3D"" style=3D""></div><div id=3D"yui_3=
_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""># This dumps =
to a local file system</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=
=3D"ltr" class=3D"" style=3D"">hive -e "SELECT f1.ln FROM (SELECT ln FROM u=
tils.file1 WHERE fn=3D'registeredIPs.txt') f1 &nbsp;JOIN (select ln from my=
utils.file2 WHERE fn=3D'visitorIPs.txt') f2 &nbsp;ON trim(f1.ln) =3D trim(f=
2.ln)" &gt; ./registered_in_visitors_list.txt</div><div id=3D"yui_3_16_0_1_=
1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""><br class=3D"" style=
=3D""></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=
=3D"" style=3D""># This dumps to a new "internally-managed-by-hive" table&n=
bsp;</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"=
" style=3D""># Make sure u already dont have some valuable hive table calle=
d "myutils.registered_in_visitors_list" - else this will overwrite that hiv=
e table with the results of this hive query&nbsp;</div><div id=3D"yui_3_16_=
0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">hive -e "USE myut=
ils; DROP TABLE IF EXITS registered_in_visitors_list; CREATE TABLE if not e=
xists registered_in_visitors_list AS SELECT f1.ln FROM (select ln FROM util=
s.file1 WHERE fn=3D'registeredIPs.txt') f1 &nbsp;JOIN (SELECT ln FROM myuti=
ls.file2 WHERE fn=3D'visitorIPs.txt') f2 &nbsp;ON trim(f1.ln) =3D trim(f2.l=
n)"</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D""=
 style=3D""><br class=3D"" style=3D""></div><div id=3D"yui_3_16_0_1_1429551=
324693_3396" dir=3D"ltr" class=3D"" style=3D""># This dumps to a directory =
on HDFS</div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=
=3D"" style=3D""># Make sure u already dont have some valuable directory ca=
lled "registered_in_visitors_list" - else this will overwrite that director=
 and all its contents with the results of this hive query&nbsp;</div><div i=
d=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D"">hiv=
e -e "INSERT OVERWRITE DIRECTORY '/workspace/myutils/filecomparator/registe=
red_in_visitors_list' SELECT f1.ln FROM (select ln FROM utils.file1 WHERE f=
n=3D'registeredIPs.txt') f1 &nbsp;JOIN (SELECT ln FROM myutils.file2 WHERE =
fn=3D'visitorIPs.txt') f2 &nbsp;ON trim(f1.ln) =3D trim(f2.ln)"</div><div i=
d=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""><br=
 class=3D"" style=3D""></div><div id=3D"yui_3_16_0_1_1429551324693_3396" di=
r=3D"ltr" class=3D"" style=3D""><br class=3D"" style=3D""></div><div id=3D"=
yui_3_16_0_1_1429551324693_3396" dir=3D"ltr" class=3D"" style=3D""><br clas=
s=3D"" style=3D""></div><div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"=
ltr">&nbsp;<br></div>  <div id=3D"yui_3_16_0_1_1429551324693_3396" dir=3D"l=
tr" class=3D"" style=3D"">&nbsp;</div></div></body></html>
------=_Part_231372_344757659.1429551682654--