Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 10661 invoked from network); 15 Dec 2009 23:42:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Dec 2009 23:42:39 -0000 Received: (qmail 32674 invoked by uid 500); 15 Dec 2009 23:42:39 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 32613 invoked by uid 500); 15 Dec 2009 23:42:39 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 32604 invoked by uid 99); 15 Dec 2009 23:42:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2009 23:42:39 +0000 X-ASF-Spam-Status: No, hits=-11.3 required=5.0 tests=AWL,BAYES_00,HABEAS_ACCREDITED_SOI,HTML_MESSAGE,RCVD_IN_BSP_TRUSTED X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.33.244.6] (HELO rhv-mipot-001.corp.ebay.com) (216.33.244.6) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2009 23:42:33 +0000 X-EBay-Corp: Yes X-IronPort-AV: E=Sophos;i="4.47,402,1257148800"; d="scan'208,217";a="49988949" Received: from rhv-vtenf-002.corp.ebay.com (HELO RHV-MEXBH-001.corp.ebay.com) ([10.112.113.53]) by rhv-mipot-001.corp.ebay.com with ESMTP; 15 Dec 2009 15:42:13 -0800 Received: from RHV-EXM-03.corp.ebay.com ([10.245.17.55]) by RHV-MEXBH-001.corp.ebay.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 15 Dec 2009 15:42:13 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CA7DE0.39F4413C" Subject: RE: LIKE operator Date: Tue, 15 Dec 2009 15:42:12 -0800 Message-ID: <82979CF82115834EAB542F32B32FBFC90935295B@RHV-EXM-03.corp.ebay.com> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: LIKE operator Thread-Index: Acp80STfeYYlyUKkTVeMRm1ImF97sgAAMeEgAEI/OlAAACXt8AAAF+lQAAAywfAAAE60sAAAInbQAABO8UAAABuPIA== References: <82979CF82115834EAB542F32B32FBFC909352914@RHV-EXM-03.corp.ebay.com> <82979CF82115834EAB542F32B32FBFC909352927@RHV-EXM-03.corp.ebay.com> <82979CF82115834EAB542F32B32FBFC90935294E@RHV-EXM-03.corp.ebay.com> From: "Sagi, Lee" To: X-OriginalArrivalTime: 15 Dec 2009 23:42:13.0412 (UTC) FILETIME=[3A159640:01CA7DE0] X-CFilter: Scanned This is a multi-part message in MIME format. ------_=_NextPart_001_01CA7DE0.39F4413C Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Table A is only one hour worth of data (around 200,000 records), and table B is around 300 records, I'm not sure if partitioning will do much. =20 Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | Cell: 718-930-7947=20 =20 ________________________________ From: Namit Jain [mailto:njain@facebook.com]=20 Sent: Tuesday, December 15, 2009 3:39 PM To: hive-user@hadoop.apache.org Subject: RE: LIKE operator Can you partition the data in A and B and then join the partitions and then union all them. =20 =20 =20 From: Sagi, Lee [mailto:lsagi@shopping.com]=20 Sent: Tuesday, December 15, 2009 3:35 PM To: hive-user@hadoop.apache.org Subject: RE: LIKE operator =20 Yes, currently I create the table first than insert (just wanted to have less code/scripts). =20 And yes, this query/join is painfully slow, but I need to flag records that match an IP pattern...I'm open for suggestions. =20 =20 Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | Cell: 718-930-7947=20 =20 =20 ________________________________ From: Namit Jain [mailto:njain@facebook.com]=20 Sent: Tuesday, December 15, 2009 3:27 PM To: hive-user@hadoop.apache.org Subject: RE: LIKE operator I should clarify that this is not the most efficient way of doing it - since we are doing a Cartesian product first (which will go to 1 reducer). =20 We do support CREATE TABLE AS SELECT in hive trunk, but it is not part of 0.4 =20 =20 You can create the table and then insert into that table. =20 =20 -namit =20 From: Sagi, Lee [mailto:lsagi@shopping.com]=20 Sent: Tuesday, December 15, 2009 3:18 PM To: hive-user@hadoop.apache.org Subject: RE: LIKE operator =20 Thanks your re-write works perfectly. =20 Allow me to piggy-back and ask a follow-up question, how can I create a table as the result of this query? I tired the obvious CREATE TABLE AAA AS SELECT .... with not success. =20 =20 Thanks. =20 Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | Cell: 718-930-7947=20 =20 =20 ________________________________ From: Namit Jain [mailto:njain@facebook.com]=20 Sent: Tuesday, December 15, 2009 3:12 PM To: hive-user@hadoop.apache.org Subject: RE: LIKE operator Hive only supports equality joins right now: =20 INSERT OVERWRITE TABLE ZZ SELECT VOTF_REQUEST_ID , THRESHOLD_VALUE , THRESHOLD_MET , BRAND_ID FROM A LEFT OUTER JOIN B ON (A.CLIENT_IP LIKE B.IP) WHERE date_key =3D '2009121315'; =20 =20 Can be rewritten as: =20 INSERT OVERWRITE TABLE ZZ SELECT VOTF_REQUEST_ID , THRESHOLD_VALUE , THRESHOLD_MET , BRAND_ID FROM A LEFT OUTER JOIN B=20 WHERE date_key =3D '2009121315' and A.CLIENT_IP LIKE B.IP; =20 =20 From: Sagi, Lee [mailto:lsagi@shopping.com]=20 Sent: Tuesday, December 15, 2009 3:09 PM To: hive-user@hadoop.apache.org Subject: LIKE operator =20 I am trying to insert data into table ZZ with an outer join result of A and B, the Join is a LIKE b/t a filed from table A and a field from table B: =20 INSERT OVERWRITE TABLE ZZ SELECT VOTF_REQUEST_ID , THRESHOLD_VALUE , THRESHOLD_MET , BRAND_ID FROM A LEFT OUTER JOIN B ON (A.CLIENT_IP LIKE B.IP) WHERE date_key =3D '2009121315'; =20 B.IP has values like: 1.2.%.%.%, 10.10.10.%, etc. =20 I get the following error: FAILED: Error in semantic analysis: line 117:57 Both Left and Right Aliases Encountered in Join IP =20 Any ideas? =20 =20 Thanks. =20 Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | Cell: 718-930-7947=20 ------_=_NextPart_001_01CA7DE0.39F4413C Content-Type: text/html; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable
Table A is only one hour worth of data (around = 200,000=20 records), and table B is around 300 records, I'm not sure if = partitioning will=20 do much.
 

Lee = Sagi | Data=20 Warehouse Tech Lead & Architect | Work: 650-616-6575 | Cell:=20 718-930-7947

 


From: Namit Jain = [mailto:njain@facebook.com]=20
Sent: Tuesday, December 15, 2009 3:39 PM
To:=20 hive-user@hadoop.apache.org
Subject: RE: LIKE=20 operator

Can=20 you partition the data in A and B and then join the partitions and then = union=20 all them.

 

 

 

From: Sagi, Lee = [mailto:lsagi@shopping.com]
Sent: Tuesday, December 15, 2009 = 3:35=20 PM
To: hive-user@hadoop.apache.org
Subject: RE: LIKE = operator

 

Yes,=20 currently I create the table first than insert (just wanted to have less = code/scripts).

 

And yes,=20 this query/join is painfully slow, but I need to flag records that match = an IP=20 pattern...I'm open for suggestions.

 

 

Lee=20 Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | = Cell:=20 718-930-7947

 

 


From: Namit = Jain=20 [mailto:njain@facebook.com]
Sent: Tuesday, December 15, 2009 = 3:27=20 PM
To: hive-user@hadoop.apache.org
Subject: RE: LIKE = operator

I=20 should clarify that this is not the most efficient way of doing it = – since we=20 are doing a Cartesian product first (which will go to 1=20 reducer).

 

We=20 do support CREATE TABLE AS SELECT in hive trunk, but it is not part of=20 0.4

 

 

You=20 can create the table and then insert into that = table.

 

 

-namit

 

From: Sagi, Lee = [mailto:lsagi@shopping.com]
Sent: Tuesday, December 15, 2009 = 3:18=20 PM
To: hive-user@hadoop.apache.org
Subject: RE: LIKE = operator

 

Thanks=20 your re-write works perfectly.

 

Allow me=20 to piggy-back and ask a follow-up question, how can I create a table as = the=20 result of this query?

I tired=20 the obvious CREATE TABLE AAA AS SELECT .... with not=20 success.

 

 

Thanks.

 

Lee=20 Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | = Cell:=20 718-930-7947

 

 


From: Namit = Jain=20 [mailto:njain@facebook.com]
Sent: Tuesday, December 15, 2009 = 3:12=20 PM
To: hive-user@hadoop.apache.org
Subject: RE: LIKE = operator

Hive=20 only supports equality joins right now:

 

INSERT OVERWRITE TABLE ZZ
SELECT VOTF_REQUEST_ID=20 ,
THRESHOLD_VALUE ,
THRESHOLD_MET ,
BRAND_ID
FROM A LEFT = OUTER JOIN=20 B ON (A.CLIENT_IP LIKE B.IP)
WHERE date_key =3D = '2009121315';

 

 

Can=20 be rewritten as:

 

INSERT OVERWRITE TABLE ZZ
SELECT VOTF_REQUEST_ID=20 ,
THRESHOLD_VALUE ,
THRESHOLD_MET ,
BRAND_ID
FROM A LEFT = OUTER JOIN=20 B
WHERE date_key =3D '2009121315'

and A.CLIENT_IP LIKE B.IP;

 

 

From: Sagi, Lee = [mailto:lsagi@shopping.com]
Sent: Tuesday, December 15, 2009 = 3:09=20 PM
To: hive-user@hadoop.apache.org
Subject: LIKE=20 operator

 

I am trying to insert data into table ZZ with an outer join = result of=20 A and B,  the Join is a LIKE b/t a filed from table A and a field = from=20 table B:

 

INSERT OVERWRITE TABLE ZZ
SELECT VOTF_REQUEST_ID=20 ,
THRESHOLD_VALUE ,
THRESHOLD_MET ,
BRAND_ID
FROM A LEFT = OUTER JOIN=20 B ON (A.CLIENT_IP LIKE B.IP)
WHERE date_key =3D = '2009121315';

 

B.IP has values like: 1.2.%.%.%, 10.10.10.%, = etc.

 

I get the following error: FAILED: Error in semantic = analysis: line=20 117:57 Both Left and Right Aliases Encountered in Join IP

 

Any ideas?

 

 

Thanks.

 

Lee=20 Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | = Cell:=20 718-930-7947

------_=_NextPart_001_01CA7DE0.39F4413C--