Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C76D7CA92 for ; Thu, 25 Jul 2013 15:36:00 +0000 (UTC) Received: (qmail 5688 invoked by uid 500); 25 Jul 2013 15:35:54 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 4930 invoked by uid 500); 25 Jul 2013 15:35:53 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 4674 invoked by uid 99); 25 Jul 2013 15:35:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2013 15:35:52 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of j.barrett.strausser@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-wg0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2013 15:35:46 +0000 Received: by mail-wg0-f48.google.com with SMTP id f12so1192111wgh.27 for ; Thu, 25 Jul 2013 08:35:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=tQUc49kSC6Hnh/LvOuTwHINLC3AXcmxy/4AhJYV/E9k=; b=S2Bo2x81No+YQuNDax4LPOalLxeSFpwwqRftTu4mUGEBinKsfoDZWGhn4q4eaBd7Q7 IfNEvPOl5cQ1fGroVW1vaXtKzxGWocLXhE3Sv25T76AxCBKxPYbN1Y/cconQWXRpEbmQ ZmyzyPFagNuBeif8kG2+wFFsNUH3h6jG4WOqx10/OfYAYpisvq/QRI/0h17fenHfrO9w VDo/F+TkefOXhsdRATglBFstAhHuLE64ZodyENlawklaQkBcXsGORNtq6xB+lax3XHog pzME/koGUoxGtGC49p6FTZAg0R3VBLCnvYTPVtvnF/0spAa3kHc3vGppkH9FcWSou9Xg nB8A== MIME-Version: 1.0 X-Received: by 10.180.205.236 with SMTP id lj12mr1992210wic.22.1374766525546; Thu, 25 Jul 2013 08:35:25 -0700 (PDT) Received: by 10.194.5.227 with HTTP; Thu, 25 Jul 2013 08:35:25 -0700 (PDT) In-Reply-To: References: <51F0ECF6.4050205@ebuddy.com> Date: Thu, 25 Jul 2013 11:35:25 -0400 Message-ID: Subject: Re: PL/SQL to HiveQL translation From: "j.barrett Strausser" To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a11c25f1c67b81d04e257c6f8 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c25f1c67b81d04e257c6f8 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The advice I have always seen for your case is to transform the subquery in the WHERE clause into a LEFT OUTER JOIN. On Thu, Jul 25, 2013 at 11:04 AM, Edson Ramiro wrote: > AFAIK, > > Hive supports subqueries only in the FROM clause. > > Maybe you have to split you query into more queries... > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQuerie= s > > > > > Edson Ramiro > > > On Thu, Jul 25, 2013 at 9:31 AM, J=E9r=F4me Verdier < > verdier.jerome66@gmail.com> wrote: > >> Hi Bennie, >> >> I was trying some solutions to pass through my problem, and a problem >> occurs >> >> here is the error : >> >> FAILED: ParseException line 26:14 cannot recognize input near 'SELECT' >> 'cal' '.' in expression specification >> >> Is AND...BETWEEN ( SELECT..... is possible in Hive? >> >> >> 2013/7/25 Bennie Schut >> >>> Hi Jerome, >>> >>> Yes it looks like you could stop using GET_SEMAINE and directly joinin= g >>> "calendrier_hebdo" with "calendrier" for example. For "FCALC_IDJOUR" yo= u >>> will have to make a udf so I hope you have some java skills :) >>> The "calendrier" tables suggests you have star schema with a calendar >>> table. If on oracle you partitioned on a date and use a subquery to get= the >>> dates you want from the fact table you can expect this to be a problem = in >>> hive. Partition pruning works during planning it will not know which >>> partitioned to prune and thus run on all the data in the fact table and >>> filter after it's done making partitioning useless. There are ways of >>> working around this, it seems most people decide to use a "deterministi= c" >>> udf which produces the dates and this causes the udfs to be run during >>> planning and partition pruning magically works again. >>> Hope this helps. >>> >>> Bennie. >>> >>> Op 25-7-2013 09:50, J=E9r=F4me Verdier schreef: >>> >>> Hi, >>> >>> I need some help to translate a PL/SQL script in HiveQL. >>> >>> Problem : my PL/SQL script is calling two functions. >>> >>> you can see the script below : >>> >>> SELECT >>> in_co_societe as co_societe, >>> 'SEMAINE' as co_type_periode, >>> a.type_entite as type_entite, >>> a.code_entite as code_entite, >>> a.type_rgrp_produits as type_rgrp_produits, >>> a.co_rgrp_produits as co_rgrp_produits, >>> SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, >>> SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, >>> SUM(a.NB_CLIENTS) as NB_CLIENTS, >>> SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, >>> SUM(a.MT_OBJ_CA_NET_TTC_COMP) as >>> MT_OBJ_CA_NET_TTC_COMP, >>> SUM(a.NB_CLIENTS_COMP) as NB_CLIENTS_COMP >>> from >>> kpi.thm_ca_rgrp_produits_jour/*@o_bi.match.eu*/ a >>> WHERE >>> a.co_societe =3D in_co_societe >>> AND a.dt_jour between >>> ( >>> SELECT >>> cal.dt_jour_deb >>> FROM ods.calendrier_hebdo cal >>> WHERE cal.co_societe =3D in_co_societe >>> AND cal.co_an_semaine =3D ods.package_date.get_semaine( >>> ods.package_date.fcalc_idjour( >>> CASE >>> WHEN TO_CHAR(D_Dernier_Jour,'YYYY') =3D >>> TO_CHAR(D_Dernier_Jour-364,'YYYY') THEN >>> NEXT_DAY(D_Dernier_Jour-364,1)-7 >>> ELSE >>> D_Dernier_Jour-364 >>> END >>> ) >>> ) >>> ) >>> AND D_Dernier_Jour-364 >>> -- On ne calcule rien si la semaine est compl=BF=BFte >>> AND ( >>> TO_CHAR(D_Dernier_Jour,'DDMM') <> '3112' >>> AND TO_CHAR(D_Dernier_Jour,'D') <> '7' >>> ) >>> GROUP BY >>> a.type_entite, >>> a.code_entite, >>> a.type_rgrp_produits, >>> a.co_rgrp_produits; >>> >>> The function ods.package_date.get_semaine is : >>> >>> FUNCTION GET_SEMAINE >>> (ID_DEB IN NUMBER) >>> RETURN NUMBER >>> IS >>> SEMAINE NUMBER(10); >>> BEGIN >>> SELECT CO_AN_SEMAINE >>> INTO SEMAINE >>> FROM CALENDRIER >>> WHERE ID_JOUR =3D ID_DEB; >>> >>> RETURN (SEMAINE); >>> EXCEPTION >>> WHEN NO_DATA_FOUND THEN >>> RETURN (0); >>> WHEN OTHERS THEN >>> RETURN (0); >>> END; >>> >>> The function ods.package_date.fcalc_idjour is below : >>> >>> FUNCTION FCALC_IDJOUR >>> (DATE_REFERENCE IN DATE) >>> RETURN NUMBER >>> IS >>> NM_ANNEE NUMBER :=3D TO_NUMBER(TO_CHAR(DATE_REFERENCE,'YYYY'= )); >>> NM_MOIS NUMBER :=3D >>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMM'),5,2)); >>> NM_JOUR NUMBER :=3D >>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMMDD'),7,2)); >>> IDJOUR_CALCULE NUMBER :=3D 0; >>> BEGIN >>> IF NM_ANNEE < 1998 >>> OR DATE_REFERENCE IS NULL THEN >>> IDJOUR_CALCULE :=3D 0; >>> ELSE >>> IDJOUR_CALCULE :=3D ((NM_ANNEE - 1998) * 600) + ((NM_MOIS - 01) * >>> 50) + NM_JOUR; >>> END IF; >>> >>> RETURN IDJOUR_CALCULE; >>> DBMS_OUTPUT.PUT_LINE(IDJOUR_CALCULE); >>> END FCALC_IDJOUR; >>> >>> Is it possible to translate this in one HiveQL script ? >>> >>> >>> >> >> >> -- >> *J=E9r=F4me VERDIER* >> 06.72.19.17.31 >> verdier.jerome66@gmail.com >> >> > --=20 https://github.com/bearrito @deepbearrito --001a11c25f1c67b81d04e257c6f8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
The advice I have always seen for your case is to transfor= m the subquery in the WHERE clause into a LEFT OUTER JOIN.




On Thu, Jul = 25, 2013 at 11:04 AM, Edson Ramiro <erlfilho@gmail.com> wro= te:
AFAIK,=A0

Hive supports subqueries only in the FROM cla= use.=A0

Maybe you have to sp= lit you query into more queries...



=A0 =A0
=A0=A0 =A0 =A0 Edson Ramiro


On Thu, Jul 25, 2013 at 9:31 AM, J=E9r= =F4me Verdier <verdier.jerome66@gmail.com> wrote:
Hi Bennie,

I was trying some s= olutions to pass through my problem, and a problem occurs

here= is the error :

FAILED: ParseException line 26:14 cannot recognize input near 'SELECT&#= 39; 'cal' '.' in expression specification


Is AND...BETWEE= N ( SELECT..... is possible in Hive?


2013/7/25 Bennie Schut <= bschut@ebuddy.com>
=20 =20 =20
Hi Jerome,

Yes it looks like you could stop using GET_SEMAINE=A0 and directly joining "calendrier_hebdo" with "calendrier" for = example. For "FCALC_IDJOUR" you will have to make a udf so I hope you have some java skills :) The "calendrier" tables suggests you have star schema with = a calendar table. If on oracle you partitioned on a date and use a subquery to get the dates you want from the fact table you can expect this to be a problem in hive. Partition pruning works during planning it will not know which partitioned to prune and thus run on all the data in the fact table and filter after it's done making partitioning useless. There are ways of working around this, it seems most people decide to use a "deterministic" = udf which produces the dates and this causes the udfs to be run during planning and partition pruning magically works again.
Hope this helps.

Bennie.

Op 25-7-2013 09:50, J=E9r=F4me Verdier schreef:
Hi,

I need some help to translate a PL/SQL script in HiveQL.

Problem : my PL/SQL script is calling two functions.

you can see the script below :

=A0=A0=A0 S= ELECT
=A0=A0=A0=A0=A0 in_co_societe=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as co_societe,
=A0=A0=A0=A0=A0 'SEMAINE'=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as co_type_periode,
=A0=A0=A0=A0=A0 a.type_entite=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as type_entite,
=A0=A0=A0=A0=A0 a.code_entite=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as code_entite,
=A0=A0=A0=A0=A0 a.type_rgrp_produits=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as type_rgrp_produits,
=A0=A0=A0=A0=A0 a.co_rgrp_produits=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as co_rgrp_produits,
=A0=A0=A0=A0=A0 SUM(a.MT_CA_NET_TTC)=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as MT_CA_NET_TTC,
=A0=A0=A0=A0=A0 SUM(a.MT_OBJ_CA_NET_TTC)=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as MT_OBJ_CA_NET_TTC,
=A0=A0=A0=A0=A0 SUM(a.NB_CLIENTS)=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as NB_CLIENTS,
=A0=A0=A0=A0=A0 SUM(a.MT_CA_NET_TTC_COMP)=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as MT_CA_NET_TTC_COMP,
=A0=A0=A0=A0=A0 SUM(a.MT_OBJ_CA_NET_TTC_COMP)=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 as MT_OBJ_CA_NET_TTC_COMP,
=A0=A0=A0=A0=A0 SUM(a.NB_CLIENTS_COMP)=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 as NB_CLIENTS_COMP
=A0=A0=A0 from
=A0=A0=A0=A0=A0 kpi.thm_ca_rgrp_produits_jour/*@o_bi.match.eu*/ a
=A0=A0=A0 WHERE
=A0=A0=A0=A0=A0=A0=A0 a.co_societe =3D in_co_societe
=A0=A0=A0 AND a.dt_jour between
=A0=A0=A0=A0=A0 (
=A0=A0=A0=A0=A0=A0=A0 SELECT
=A0=A0=A0=A0=A0=A0=A0=A0=A0 cal.dt_jour_deb
=A0=A0=A0=A0=A0=A0=A0 FROM ods.calendrier_hebdo cal
=A0=A0=A0=A0=A0=A0=A0 WHERE cal.co_societe =3D in_co_societ= e
=A0=A0=A0=A0=A0=A0=A0 AND cal.co_an_semaine =3D ods.package_date.get_semaine(
=A0=A0=A0=A0=A0=A0=A0=A0=A0 ods.package_date.fcalc_idjour(<= br> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 CASE
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 WHEN TO_CHAR(D_Dern= ier_Jour,'YYYY') =3D TO_CHAR(D_Dernier_Jour-364,'YYYY') THEN
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 NEXT_DAY(D_De= rnier_Jour-364,1)-7
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ELSE
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 D_Dernier_Jou= r-364
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 END
=A0=A0=A0=A0=A0=A0=A0=A0=A0 )
=A0=A0=A0=A0=A0=A0=A0 )
=A0=A0=A0=A0=A0 )
=A0=A0=A0=A0=A0 AND D_Dernier_Jour-364
=A0=A0=A0 -- On ne calcule rien si la semaine est compl=BF= =BFte
=A0=A0=A0 AND (
=A0=A0=A0=A0=A0=A0=A0=A0=A0 TO_CHAR(D_Dernier_Jour,'DDM= M') <> '3112'
=A0=A0=A0=A0=A0 AND TO_CHAR(D_Dernier_Jour,'D') <= ;> '7'
=A0=A0=A0 )
=A0=A0=A0 GROUP BY
=A0=A0=A0=A0=A0 a.type_entite,
=A0=A0=A0=A0=A0 a.code_entite,
=A0=A0=A0=A0=A0 a.type_rgrp_produits,
=A0=A0=A0=A0=A0 a.co_rgrp_produits;


The function ods.package_date.get_semaine is :

FUNCTION GET_SEMAINE
=A0=A0=A0=A0=A0=A0 (ID_DEB=A0 IN NUMBER)
=A0 RETURN NUMBER
=A0 IS
=A0=A0=A0 SEMAINE=A0 NUMBER(10);
=A0 BEGIN
=A0=A0=A0 SELECT CO_AN_SEMAINE
=A0=A0=A0 INTO=A0=A0 SEMAINE
=A0=A0=A0 FROM=A0=A0 CALENDRIER
=A0=A0=A0 WHERE=A0 ID_JOUR =3D ID_DEB;

=A0=A0=A0 RETURN (SEMAINE);
=A0 EXCEPTION
=A0=A0=A0 WHEN NO_DATA_FOUND THEN
=A0=A0=A0=A0=A0 RETURN (0);
=A0=A0=A0 WHEN OTHERS THEN
=A0=A0=A0=A0=A0 RETURN (0);
=A0 END;


The function ods.package_date.fcalc_idjour is below :

FUNCTION FCALC_IDJOUR
=A0=A0=A0=A0=A0=A0 (DATE_REFERENCE=A0 IN DATE)
=A0 RETURN NUMBER
=A0 IS
=A0=A0=A0 NM_ANNEE=A0=A0=A0=A0=A0=A0=A0 NUMBER :=3D TO_NUMBER(TO_CHAR(DATE_REFERENCE,'YYYY'));
=A0=A0=A0 NM_MOIS=A0=A0=A0=A0=A0=A0=A0=A0 NUMBER :=3D TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMM'),5,2))= ;
=A0=A0=A0 NM_JOUR=A0=A0=A0=A0=A0=A0=A0=A0 NUMBER :=3D TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMMDD'),7,2= ));
=A0=A0=A0 IDJOUR_CALCULE=A0 NUMBER :=3D 0;
=A0 BEGIN
=A0=A0=A0 IF NM_ANNEE < 1998
=A0=A0=A0=A0=A0=A0=A0 OR DATE_REFERENCE IS NULL THEN
=A0=A0=A0=A0=A0 IDJOUR_CALCULE :=3D 0;
=A0=A0=A0 ELSE
=A0=A0=A0=A0=A0 IDJOUR_CALCULE :=3D ((NM_ANNEE - 1998) * 600) + ((NM_MOIS - 01) * 50) + NM_JOUR;
=A0=A0=A0 END IF;

=A0=A0=A0 RETURN IDJOUR_CALCULE;
=A0=A0=A0 DBMS_OUTPUT.PUT_LINE(IDJOUR_CALCULE);
=A0 END FCALC_IDJOUR;


Is it possible to translate this in one=A0 HiveQL script ?




--
J=E9r=F4me VERDIER
06.72.19.17.31





--
--001a11c25f1c67b81d04e257c6f8--