hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Verdier <verdier.jerom...@gmail.com>
Subject Re: PL/SQL to HiveQL translation
Date Mon, 29 Jul 2013 09:47:10 GMT
Hi,

Thanks everyone for your help.

Has anyone have a good tutorial to run Hive queries and scripts with Java
(over Eclipse). I have some Java Development basis but i'm  pretty new
using Hive with Java/Eclipse.

Thanks.


2013/7/25 j.barrett Strausser <j.barrett.strausser@gmail.com>

> The advice I have always seen for your case is to transform the subquery
> in the WHERE clause into a LEFT OUTER JOIN.
>
>
>
>
> On Thu, Jul 25, 2013 at 11:04 AM, Edson Ramiro <erlfilho@gmail.com> wrote:
>
>> AFAIK,
>>
>> Hive supports subqueries only in the FROM clause.
>>
>> Maybe you have to split you query into more queries...
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
>>
>>
>>
>>
>>        Edson Ramiro
>>
>>
>> On Thu, Jul 25, 2013 at 9:31 AM, Jérôme Verdier <
>> verdier.jerome66@gmail.com> wrote:
>>
>>> Hi Bennie,
>>>
>>> I was trying some solutions to pass through my problem, and a problem
>>> occurs
>>>
>>> here is the error :
>>>
>>> FAILED: ParseException line 26:14 cannot recognize input near 'SELECT'
>>> 'cal' '.' in expression specification
>>>
>>> Is AND...BETWEEN ( SELECT..... is possible in Hive?
>>>
>>>
>>> 2013/7/25 Bennie Schut <bschut@ebuddy.com>
>>>
>>>>  Hi Jerome,
>>>>
>>>> Yes it looks like you could stop using GET_SEMAINE  and directly
>>>> joining "calendrier_hebdo" with "calendrier" for example. For
>>>> "FCALC_IDJOUR" you will have to make a udf so I hope you have some java
>>>> skills :)
>>>> The "calendrier" tables suggests you have star schema with a calendar
>>>> table. If on oracle you partitioned on a date and use a subquery to get the
>>>> dates you want from the fact table you can expect this to be a problem in
>>>> hive. Partition pruning works during planning it will not know which
>>>> partitioned to prune and thus run on all the data in the fact table and
>>>> filter after it's done making partitioning useless. There are ways of
>>>> working around this, it seems most people decide to use a "deterministic"
>>>> udf which produces the dates and this causes the udfs to be run during
>>>> planning and partition pruning magically works again.
>>>> Hope this helps.
>>>>
>>>> Bennie.
>>>>
>>>> Op 25-7-2013 09:50, Jérôme Verdier schreef:
>>>>
>>>>    Hi,
>>>>
>>>>  I need some help to translate a PL/SQL script in HiveQL.
>>>>
>>>>  Problem : my PL/SQL script is calling two functions.
>>>>
>>>>  you can see the script below :
>>>>
>>>>     SELECT
>>>>       in_co_societe                             as co_societe,
>>>>       'SEMAINE'                                 as co_type_periode,
>>>>       a.type_entite                             as type_entite,
>>>>       a.code_entite                             as code_entite,
>>>>       a.type_rgrp_produits                      as type_rgrp_produits,
>>>>       a.co_rgrp_produits                        as co_rgrp_produits,
>>>>       SUM(a.MT_CA_NET_TTC)                      as MT_CA_NET_TTC,
>>>>       SUM(a.MT_OBJ_CA_NET_TTC)                  as MT_OBJ_CA_NET_TTC,
>>>>       SUM(a.NB_CLIENTS)                         as NB_CLIENTS,
>>>>       SUM(a.MT_CA_NET_TTC_COMP)                 as MT_CA_NET_TTC_COMP,
>>>>       SUM(a.MT_OBJ_CA_NET_TTC_COMP)             as
>>>> MT_OBJ_CA_NET_TTC_COMP,
>>>>       SUM(a.NB_CLIENTS_COMP)                    as NB_CLIENTS_COMP
>>>>     from
>>>>       kpi.thm_ca_rgrp_produits_jour/*@o_bi.match.eu*/ a
>>>>     WHERE
>>>>         a.co_societe = in_co_societe
>>>>     AND a.dt_jour between
>>>>       (
>>>>         SELECT
>>>>           cal.dt_jour_deb
>>>>         FROM ods.calendrier_hebdo cal
>>>>         WHERE cal.co_societe = in_co_societe
>>>>         AND cal.co_an_semaine = ods.package_date.get_semaine(
>>>>           ods.package_date.fcalc_idjour(
>>>>             CASE
>>>>               WHEN TO_CHAR(D_Dernier_Jour,'YYYY') =
>>>> TO_CHAR(D_Dernier_Jour-364,'YYYY') THEN
>>>>                 NEXT_DAY(D_Dernier_Jour-364,1)-7
>>>>               ELSE
>>>>                 D_Dernier_Jour-364
>>>>             END
>>>>           )
>>>>         )
>>>>       )
>>>>       AND D_Dernier_Jour-364
>>>>     -- On ne calcule rien si la semaine est compl¿¿te
>>>>     AND (
>>>>           TO_CHAR(D_Dernier_Jour,'DDMM') <> '3112'
>>>>       AND TO_CHAR(D_Dernier_Jour,'D') <> '7'
>>>>     )
>>>>     GROUP BY
>>>>       a.type_entite,
>>>>       a.code_entite,
>>>>       a.type_rgrp_produits,
>>>>       a.co_rgrp_produits;
>>>>
>>>>  The function ods.package_date.get_semaine is :
>>>>
>>>> FUNCTION GET_SEMAINE
>>>>        (ID_DEB  IN NUMBER)
>>>>   RETURN NUMBER
>>>>   IS
>>>>     SEMAINE  NUMBER(10);
>>>>   BEGIN
>>>>     SELECT CO_AN_SEMAINE
>>>>     INTO   SEMAINE
>>>>     FROM   CALENDRIER
>>>>     WHERE  ID_JOUR = ID_DEB;
>>>>
>>>>     RETURN (SEMAINE);
>>>>   EXCEPTION
>>>>     WHEN NO_DATA_FOUND THEN
>>>>       RETURN (0);
>>>>     WHEN OTHERS THEN
>>>>       RETURN (0);
>>>>   END;
>>>>
>>>>  The function ods.package_date.fcalc_idjour is below :
>>>>
>>>> FUNCTION FCALC_IDJOUR
>>>>        (DATE_REFERENCE  IN DATE)
>>>>   RETURN NUMBER
>>>>   IS
>>>>     NM_ANNEE        NUMBER := TO_NUMBER(TO_CHAR(DATE_REFERENCE,'YYYY'));
>>>>     NM_MOIS         NUMBER :=
>>>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMM'),5,2));
>>>>     NM_JOUR         NUMBER :=
>>>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMMDD'),7,2));
>>>>     IDJOUR_CALCULE  NUMBER := 0;
>>>>   BEGIN
>>>>     IF NM_ANNEE < 1998
>>>>         OR DATE_REFERENCE IS NULL THEN
>>>>       IDJOUR_CALCULE := 0;
>>>>     ELSE
>>>>       IDJOUR_CALCULE := ((NM_ANNEE - 1998) * 600) + ((NM_MOIS - 01) *
>>>> 50) + NM_JOUR;
>>>>     END IF;
>>>>
>>>>     RETURN IDJOUR_CALCULE;
>>>>     DBMS_OUTPUT.PUT_LINE(IDJOUR_CALCULE);
>>>>   END FCALC_IDJOUR;
>>>>
>>>>  Is it possible to translate this in one  HiveQL script ?
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Jérôme VERDIER*
>>> 06.72.19.17.31
>>> verdier.jerome66@gmail.com
>>>
>>>
>>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>



-- 
*Jérôme VERDIER*
06.72.19.17.31
verdier.jerome66@gmail.com

Mime
View raw message