Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E4F8B1800B for ; Mon, 24 Aug 2015 20:57:57 +0000 (UTC) Received: (qmail 83182 invoked by uid 500); 24 Aug 2015 20:57:55 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 83111 invoked by uid 500); 24 Aug 2015 20:57:55 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 83101 invoked by uid 99); 24 Aug 2015 20:57:55 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Aug 2015 20:57:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 58D911AAC58 for ; Mon, 24 Aug 2015 20:57:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.15 X-Spam-Level: *** X-Spam-Status: No, score=3.15 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id tt4RgIJeTZPK for ; Mon, 24 Aug 2015 20:57:47 +0000 (UTC) Received: from mail-qk0-f171.google.com (mail-qk0-f171.google.com [209.85.220.171]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 3273D42BAB for ; Mon, 24 Aug 2015 20:57:47 +0000 (UTC) Received: by qkbm65 with SMTP id m65so87874748qkb.2 for ; Mon, 24 Aug 2015 13:57:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=IfMu1ZToTz+kO05HumIOArWL25nNYHMWH5ZBtitwt+U=; b=qArxTAq1aewegU2XFTuTXf1mN/OyRCMhaLtr4x5PKR/OFkHXxJRYs7Y6NEtN46NUzp oD/JBCpoJtkDYMCX+42SbHw4BGtxVdLpgpv94lJ9WugFFOPmD7uOcR3DyVN5dJo+gyqi U1HTIslcQpr4sVwCPF9N3mqR+FS63wOf/1bMAkODU7ANPD0ViZqK6bKAkNf7DFkz5aOn PE5MnzSF+bM49DNWIacfEy36af9U7HLBYjbGFwTvh3GZFDqFCs+X1gdSusJchW3VhbCW RbHSI/LtQUDv+v/QeSi4SlJnL4v1KHOswy7zh0xctEllmDVYykBlfkr5wfCM7NJzR+Rp kEag== MIME-Version: 1.0 X-Received: by 10.170.221.137 with SMTP id n131mr32488726ykf.83.1440449866856; Mon, 24 Aug 2015 13:57:46 -0700 (PDT) Received: by 10.37.7.1 with HTTP; Mon, 24 Aug 2015 13:57:46 -0700 (PDT) In-Reply-To: <234386700.6421306.1440443731479.JavaMail.yahoo@mail.yahoo.com> References: <234386700.6421306.1440443731479.JavaMail.yahoo@mail.yahoo.com> Date: Mon, 24 Aug 2015 13:57:46 -0700 Message-ID: Subject: Re: Using SPLIT with DOT(.) delimiter demonstrate funny behavior within a VIEW From: Vivek Veeramani To: user@hive.apache.org, Sanjay Subramanian Content-Type: multipart/alternative; boundary=001a113a33d4a17e95051e14de0b --001a113a33d4a17e95051e14de0b Content-Type: text/plain; charset=UTF-8 Hi Sanjay, Try replacing the 4 backslashes with just 2. Usually works with 2 backslashes. Replace this *split(reverse(split(reverse(**floc**),'/')[0]),'\\\\.')[0]* as resid with *split(reverse(split(reverse(floc**),'/')[0]),'\\.')[0]* as resid Please have a look and let us know if that helps. Best , Vivek Veeramani cell : + 1-415 996 7853 On Mon, Aug 24, 2015 at 12:15 PM, Sanjay Subramanian < sanjaysubramanian@yahoo.com> wrote: > Hi guys > > I am using > > Hive version = 0.13.1-cdh5.3.3 > > > > *HIVE TABLE = qnap_resume_file_location* > *---------------------------------------* > DROP TABLE IF EXISTS > qnap_resume_file_location > ; > CREATE EXTERNAL TABLE qnap_resume_file_location ( > floc STRING > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/data/ > myfirm/file_location' > ; > > *SAMPLE DATA = qnap_resume_file_location* > *---------------------------------* > "select * from resume.qnap_resume_file_location limit 10" > /mnt/myfirm/200901/14008_1_1004942032_0_1005347483.html > /mnt/myfirm/200901/14008_1_1004944724_0_1005334758.html > /mnt/myfirm/200901/14000__900719077_10_901309048.doc > /mnt/myfirm/200901/14012_6_1004858088_0_1005227990.doc > /mnt/myfirm/200901/14008_1_1004811416_0_1005258541.html > /mnt/myfirm/200901/14008_1_501597002_6_1005224039.html > /mnt/myfirm/200901/14009_5_1004872908_0_1005244348.doc > /mnt/myfirm/200901/14001_7_14165_1_66775.docx > /mnt/myfirm/200901/14009_5_1004935267_0_1005322091.txt > /mnt/myfirm/200901/14012_6_1004904422_0_1005283729.doc > > > *HIVE VIEW = qnap_resume_resid_file_loc_map_vw* > *---------------------------------------------* > DROP VIEW IF EXISTS > qnap_resume_resid_file_loc_map_vw > ; > CREATE VIEW IF NOT EXISTS qnap_resume_resid_file_loc_map_vw ( > resid , > yyyymm , > floc > ) > AS > SELECT > split(reverse(split(reverse(floc),'/')[0]),'\\\\.')[0] as resid, > reverse(split(reverse(floc),'/')[1]) as yyyymm, > floc > FROM > resume.qnap_resume_file_location > ; > > > *QUERY ON VIEW* > "select * from qnap_resume_resid_file_loc_map_vw limit 10" > *EXPECTED RESULTS* > *14012_6_1006686583_0_1102955123* 201205 > /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx > 14009_5_1008440384_0_1102887209 201205 > /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc > 14013_7_700200576_8_1102752594 201205 > /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc > 14015_8_1008470815_0_1102954525 201205 > /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc > 14009_5_1008373102_0_1102709973 201205 > /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc > 14011_1_1007981566_0_1102730546 201205 > /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt > 14015_8_1008436709_0_1102867682 201205 > /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx > 14001_7_1006347358_0_1007662042 201205 > /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx > 14009_5_1008382590_0_1102732450 201205 > /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc > 14000_6_1008189721_0_1102253740 201205 > /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx > > *ACTUAL RESULTS* (see the first column the DOT and extension are still > there) > *14012_6_1006686583_0_1102955123** .docx* 201205 > /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx > 14009_5_1008440384_0_1102887209 .doc 201205 > /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc > 14013_7_700200576_8_1102752594 .doc 201205 > /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc > 14015_8_1008470815_0_1102954525.doc 201205 > /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc > 14009_5_1008373102_0_1102709973 .doc 201205 > /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc > 14011_1_1007981566_0_1102730546 .txt 201205 > /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt > 14015_8_1008436709_0_1102867682 .docx 201205 > /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx > 14001_7_1006347358_0_1007662042 .docx 201205 > /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx > 14009_5_1008382590_0_1102732450 .doc 201205 > /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc > 14000_6_1008189721_0_1102253740 .docx 201205 > /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx > > *EXECUTING THE SELECT HIVE QUERY INSIDE THE VIEW INDEPENDENTLY (gives u > correct results - but when placed inside a view then it does not seem to > work)* > SELECT > split(reverse(split(reverse(floc),'/')[0]),'\\\\.')[0] as resid, > reverse(split(reverse(floc),'/')[1]) as yyyymm, > floc > FROM > resume.qnap_resume_file_location > > 14012_6_1006686583_0_1102955123 201205 > /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx > 14009_5_1008440384_0_1102887209 201205 > /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc > 14013_7_700200576_8_1102752594 201205 > /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc > 14015_8_1008470815_0_1102954525 201205 > /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc > 14009_5_1008373102_0_1102709973 201205 > /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc > 14011_1_1007981566_0_1102730546 201205 > /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt > 14015_8_1008436709_0_1102867682 201205 > /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx > 14001_7_1006347358_0_1007662042 201205 > /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx > 14009_5_1008382590_0_1102732450 201205 > /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc > 14000_6_1008189721_0_1102253740 201205 > /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx > > > Thoughts ? Ideas ? > > thanks > > sanjay > > --001a113a33d4a17e95051e14de0b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Sanjay,

Try replacing the 4 backslas= hes with just 2. Usually works with 2 backslashes.

Replace this split(reverse(split(reverse(floc),'/&#= 39;)[0]),'\\\\.&= #39;)[0] as=C2=A0resid=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 with=C2=A0split(reverse(split(reverse(f= loc),'/')[0]),'\\.')[0] as=C2=A0resid

<= /span>
Please have a look and let us know if= that helps.



Best ,
Vivek Veeramani

cell : + 1-415 996 7853
<= div dir=3D"ltr">

On Mon, Aug 24, 2015 at 12:15 PM, Sanjay Sub= ramanian <sanjaysubramanian@yahoo.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
Hi guys

I am using=C2=A0
Hive version =3D 0.13.1-cdh5.3.3

HIVE TABLE =3D=C2=A0qnap_resume_file_location
-------------------= --------------------
DROP=C2=A0 TABLE IF EXISTS
=C2=A0=C2=A0 =C2=A0 qnap_resume_file_location
;
CREATE EXTERNAL TABLE qnap_resume_file_location (
floc STRING
=C2=A0=C2=A0 =C2=A0 ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\= t'=C2=A0 LOCATION '/data/myfirm/file_location'
;

SAMPLE DATA =3D=C2=A0qnap_resume_file_location
---------------------------------
"sel= ect * from resume.qnap_resume_file_location limit 10"
/mnt/myfirm/200901/14008_1_1004942032_0_1005347483.html
/mnt/myfirm/200901/14008_1_1004944724_0_100533475= 8.html
/mnt/myfirm/200901/14000__9007190= 77_10_901309048.doc
/mnt/myfirm/200901/1= 4012_6_1004858088_0_1005227990.doc
/mnt/= myfirm/200901/14008_1_1004811416_0_1005258541.html
/mnt/myfirm/200901/14008_1_501597002_6_1005224039.html
/mnt/myfirm/200901/14009_5_1004872908_0_100524434= 8.doc
/mnt/myfirm/200901/14001_7_14165_1= _66775.docx
/mnt/myfirm/200901/14009_5_1= 004935267_0_1005322091.txt
/mnt/myfirm/200901/14012_6_1004904422_0_100528= 3729.doc


HIVE VIEW =3D qnap_resume_res= id_file_loc_map_vw
---------------------------= ------------------
DROP VIEW IF EXISTS
=C2=A0=C2=A0 =C2=A0 qnap_resume_resid_file_loc_map_vw
;
CREATE VIEW IF NOT EXISTS qnap_resume_resid_file_loc_map_vw (
resid ,
yyyymm ,
floc=C2=A0
=C2=A0=C2=A0 =C2=A0 )
AS
SELECT
=C2=A0 =C2=A0 split(reverse(split(reverse(floc),'/= 9;)[0]),'\\\\.')[0] as resid,=C2=A0
=C2=A0 =C2=A0 reverse(split(reverse(floc),'/')[1]= )=C2=A0 as yyyymm,
=C2=A0 =C2=A0 floc
FROM=C2=A0
=C2=A0 resume.qnap_resume_file_location
;


QUERY ON VIEW
"select * from=C2=A0qnap_resume_resid_fil= e_loc_map_vw=C2=A0limit 10"
<= i>EXPECTED RESULTS
14= 012_6_1006686583_0_1102955123 = 201205 /mnt/myfirm/2= 01205/14012_6_1006686583_0_1102955123.docx
14009_5_10= 08440384_0_1102887209 20120= 5 /mnt/myfirm/201205/14009_= 5_1008440384_0_1102887209.doc
14013_7_700200576_8_110= 2752594 201205 /mnt/myfirm/201205/14013_7_700200576_= 8_1102752594.doc
14015_8_1008470815_0_1102954525 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.= doc
14009_5_1008373102_0_1102709973 201205 = /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc
14011_1_1007981566_0_1102730546 201205 /m= nt/myfirm/201205/14011_1_1007981566_0_1102730546.txt
= 14015_8_1008436709_0_1102867682 201205 /mnt/myfirm/201= 205/14015_8_1008436709_0_1102867682.docx
14001_7_1006= 347358_0_1007662042 201205<= span style=3D"white-space:pre-wrap"> /mnt/myfirm/201205/14001_7_= 1006347358_0_1007662042.docx
14009_5_1008382590_0_110= 2732450 201205 /mnt/myfirm/201205/14009_5_1008382590= _0_1102732450.doc
14000_6_1008189721_0_1102253740 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740= .docx

<= i>ACTUAL RESULTS (see the first column the DOT and extension are= still there)=C2=A0
14012_6_1006686583_0_11= 02955123 .docx 20= 1205 /mnt/myfirm/201205/140= 12_6_1006686583_0_1102955123.docx
14009_5_1008440384_= 0_1102887209 .doc 201205 /mnt/myfirm/201205/14009_5_100= 8440384_0_1102887209.doc
14013_7_700200576_8_11027525= 94 .doc 201205 /mnt/myfirm/201205/14013_7_700200576_= 8_1102752594.doc
14015_8_1008470815_0_1102954525.doc 201205 /mn= t/myfirm/201205/14015_8_1008470815_0_1102954525.doc
1= 4009_5_1008373102_0_1102709973 = .doc 201205 /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc
14011_1_1007981566_0_1102730546 .tx= t 201205 /mnt/myf= irm/201205/14011_1_1007981566_0_1102730546.txt
14015_= 8_1008436709_0_1102867682 .docx 201205 /mn= t/myfirm/201205/14015_8_1008436709_0_1102867682.docx
= 14001_7_1006347358_0_1007662042 .docx 201205 /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx
14009_5_1008382590_0_1102732450 = .doc 201205 = /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc
14000_6_1008189721_0_11022537= 40 .docx 201205= /mnt/myfirm/201205/14000_6= _1008189721_0_1102253740.docx

EXECUTING THE SELECT HIVE QUERY INSIDE THE VIEW INDEPENDENTLY (gives= u correct results - but when placed inside a view then it does not seem to= work)
SELECT
=C2=A0 =C2=A0 split(reverse(split(reverse(floc),'/= 9;)[0]),'\\\\.')[0] as resid,=C2=A0
=C2=A0 =C2=A0 reverse(split(reverse(floc),'/')[1]= )=C2=A0 as yyyymm,
=C2=A0 =C2=A0 floc
FROM=C2=A0
=C2=A0 resume.qnap_resume_file_location

14012_6_1006686= 583_0_1102955123 201= 205 /mnt/myfirm/201205/1401= 2_6_1006686583_0_1102955123.docx
14009_5_1008440384_0= _1102887209 201205 /mnt/myfirm/201205/14009_5_10084403= 84_0_1102887209.doc
14013_7_700200576_8_1102752594 201205 /mnt/myfirm/201205/14013_7_700200576_8_110275259= 4.doc
14015_8_1008470815_0_1102954525 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc
14009_5_1008373102_0_1102709973 201205 = /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc
14011_1_1007981566_0_1102730546 = 201205 /mnt/myfirm/2= 01205/14011_1_1007981566_0_1102730546.txt
14015_8_100= 8436709_0_1102867682 201205= /mnt/myfirm/201205/14015_8= _1008436709_0_1102867682.docx
14001_7_1006347358_0_10= 07662042 201205 /mnt/myfirm/201205/14001_7_1006347358= _0_1007662042.docx
14009_5_1008382590_0_1102732450 201205 /mnt/myfirm/201205/14009_5_1008382590_0_110273245= 0.doc
14000_6_1008189721_0_1102253740 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx
<= div dir=3D"ltr">

Thoughts ? Ideas ?

thanks=

<= /div>
sanjay


--001a113a33d4a17e95051e14de0b--