hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Verdier <verdier.jerom...@gmail.com>
Subject Re: PL/SQL to HiveQL translation
Date Tue, 30 Jul 2013 07:20:53 GMT
Hi,

Thanks for this link, it was very helpful :-)

I have another question.

I have some HiveQL script wich are stored into .hql file.

What is the best way to execute these scripts with a Java/JDBC program ?

Thanks.


2013/7/29 Brendan Heussler <bheussler@gmail.com>

> Jerome,
>
> There is a really good page on the wiki:
> https://cwiki.apache.org/Hive/hiveserver2-clients.html
>
> I use the HiveServer2 JDBC driver.  Maybe there are other ways?
>
>
>
> Brendan
>
>
> On Mon, Jul 29, 2013 at 5:47 AM, Jérôme Verdier <
> verdier.jerome66@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks everyone for your help.
>>
>> Has anyone have a good tutorial to run Hive queries and scripts with Java
>> (over Eclipse). I have some Java Development basis but i'm  pretty new
>> using Hive with Java/Eclipse.
>>
>> Thanks.
>>
>>
>> 2013/7/25 j.barrett Strausser <j.barrett.strausser@gmail.com>
>>
>>> The advice I have always seen for your case is to transform the subquery
>>> in the WHERE clause into a LEFT OUTER JOIN.
>>>
>>>
>>>
>>>
>>> On Thu, Jul 25, 2013 at 11:04 AM, Edson Ramiro <erlfilho@gmail.com>wrote:
>>>
>>>> AFAIK,
>>>>
>>>> Hive supports subqueries only in the FROM clause.
>>>>
>>>> Maybe you have to split you query into more queries...
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
>>>>
>>>>
>>>>
>>>>
>>>>        Edson Ramiro
>>>>
>>>>
>>>> On Thu, Jul 25, 2013 at 9:31 AM, Jérôme Verdier <
>>>> verdier.jerome66@gmail.com> wrote:
>>>>
>>>>> Hi Bennie,
>>>>>
>>>>> I was trying some solutions to pass through my problem, and a problem
>>>>> occurs
>>>>>
>>>>> here is the error :
>>>>>
>>>>> FAILED: ParseException line 26:14 cannot recognize input near 'SELECT'
>>>>> 'cal' '.' in expression specification
>>>>>
>>>>> Is AND...BETWEEN ( SELECT..... is possible in Hive?
>>>>>
>>>>>
>>>>> 2013/7/25 Bennie Schut <bschut@ebuddy.com>
>>>>>
>>>>>>  Hi Jerome,
>>>>>>
>>>>>> Yes it looks like you could stop using GET_SEMAINE  and directly
>>>>>> joining "calendrier_hebdo" with "calendrier" for example. For
>>>>>> "FCALC_IDJOUR" you will have to make a udf so I hope you have some
java
>>>>>> skills :)
>>>>>> The "calendrier" tables suggests you have star schema with a calendar
>>>>>> table. If on oracle you partitioned on a date and use a subquery
to get the
>>>>>> dates you want from the fact table you can expect this to be a problem
in
>>>>>> hive. Partition pruning works during planning it will not know which
>>>>>> partitioned to prune and thus run on all the data in the fact table
and
>>>>>> filter after it's done making partitioning useless. There are ways
of
>>>>>> working around this, it seems most people decide to use a "deterministic"
>>>>>> udf which produces the dates and this causes the udfs to be run during
>>>>>> planning and partition pruning magically works again.
>>>>>> Hope this helps.
>>>>>>
>>>>>> Bennie.
>>>>>>
>>>>>> Op 25-7-2013 09:50, Jérôme Verdier schreef:
>>>>>>
>>>>>>    Hi,
>>>>>>
>>>>>>  I need some help to translate a PL/SQL script in HiveQL.
>>>>>>
>>>>>>  Problem : my PL/SQL script is calling two functions.
>>>>>>
>>>>>>  you can see the script below :
>>>>>>
>>>>>>     SELECT
>>>>>>       in_co_societe                             as co_societe,
>>>>>>       'SEMAINE'                                 as co_type_periode,
>>>>>>       a.type_entite                             as type_entite,
>>>>>>       a.code_entite                             as code_entite,
>>>>>>       a.type_rgrp_produits                      as type_rgrp_produits,
>>>>>>       a.co_rgrp_produits                        as co_rgrp_produits,
>>>>>>       SUM(a.MT_CA_NET_TTC)                      as MT_CA_NET_TTC,
>>>>>>       SUM(a.MT_OBJ_CA_NET_TTC)                  as MT_OBJ_CA_NET_TTC,
>>>>>>       SUM(a.NB_CLIENTS)                         as NB_CLIENTS,
>>>>>>       SUM(a.MT_CA_NET_TTC_COMP)                 as MT_CA_NET_TTC_COMP,
>>>>>>       SUM(a.MT_OBJ_CA_NET_TTC_COMP)             as
>>>>>> MT_OBJ_CA_NET_TTC_COMP,
>>>>>>       SUM(a.NB_CLIENTS_COMP)                    as NB_CLIENTS_COMP
>>>>>>     from
>>>>>>       kpi.thm_ca_rgrp_produits_jour/*@o_bi.match.eu*/ a
>>>>>>     WHERE
>>>>>>         a.co_societe = in_co_societe
>>>>>>     AND a.dt_jour between
>>>>>>       (
>>>>>>         SELECT
>>>>>>           cal.dt_jour_deb
>>>>>>         FROM ods.calendrier_hebdo cal
>>>>>>         WHERE cal.co_societe = in_co_societe
>>>>>>         AND cal.co_an_semaine = ods.package_date.get_semaine(
>>>>>>           ods.package_date.fcalc_idjour(
>>>>>>             CASE
>>>>>>               WHEN TO_CHAR(D_Dernier_Jour,'YYYY') =
>>>>>> TO_CHAR(D_Dernier_Jour-364,'YYYY') THEN
>>>>>>                 NEXT_DAY(D_Dernier_Jour-364,1)-7
>>>>>>               ELSE
>>>>>>                 D_Dernier_Jour-364
>>>>>>             END
>>>>>>           )
>>>>>>         )
>>>>>>       )
>>>>>>       AND D_Dernier_Jour-364
>>>>>>     -- On ne calcule rien si la semaine est compl¿¿te
>>>>>>     AND (
>>>>>>           TO_CHAR(D_Dernier_Jour,'DDMM') <> '3112'
>>>>>>       AND TO_CHAR(D_Dernier_Jour,'D') <> '7'
>>>>>>     )
>>>>>>     GROUP BY
>>>>>>       a.type_entite,
>>>>>>       a.code_entite,
>>>>>>       a.type_rgrp_produits,
>>>>>>       a.co_rgrp_produits;
>>>>>>
>>>>>>  The function ods.package_date.get_semaine is :
>>>>>>
>>>>>> FUNCTION GET_SEMAINE
>>>>>>        (ID_DEB  IN NUMBER)
>>>>>>   RETURN NUMBER
>>>>>>   IS
>>>>>>     SEMAINE  NUMBER(10);
>>>>>>   BEGIN
>>>>>>     SELECT CO_AN_SEMAINE
>>>>>>     INTO   SEMAINE
>>>>>>     FROM   CALENDRIER
>>>>>>     WHERE  ID_JOUR = ID_DEB;
>>>>>>
>>>>>>     RETURN (SEMAINE);
>>>>>>   EXCEPTION
>>>>>>     WHEN NO_DATA_FOUND THEN
>>>>>>       RETURN (0);
>>>>>>     WHEN OTHERS THEN
>>>>>>       RETURN (0);
>>>>>>   END;
>>>>>>
>>>>>>  The function ods.package_date.fcalc_idjour is below :
>>>>>>
>>>>>> FUNCTION FCALC_IDJOUR
>>>>>>        (DATE_REFERENCE  IN DATE)
>>>>>>   RETURN NUMBER
>>>>>>   IS
>>>>>>     NM_ANNEE        NUMBER :=
>>>>>> TO_NUMBER(TO_CHAR(DATE_REFERENCE,'YYYY'));
>>>>>>     NM_MOIS         NUMBER :=
>>>>>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMM'),5,2));
>>>>>>     NM_JOUR         NUMBER :=
>>>>>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMMDD'),7,2));
>>>>>>     IDJOUR_CALCULE  NUMBER := 0;
>>>>>>   BEGIN
>>>>>>     IF NM_ANNEE < 1998
>>>>>>         OR DATE_REFERENCE IS NULL THEN
>>>>>>       IDJOUR_CALCULE := 0;
>>>>>>     ELSE
>>>>>>       IDJOUR_CALCULE := ((NM_ANNEE - 1998) * 600) + ((NM_MOIS - 01)
*
>>>>>> 50) + NM_JOUR;
>>>>>>     END IF;
>>>>>>
>>>>>>     RETURN IDJOUR_CALCULE;
>>>>>>     DBMS_OUTPUT.PUT_LINE(IDJOUR_CALCULE);
>>>>>>   END FCALC_IDJOUR;
>>>>>>
>>>>>>  Is it possible to translate this in one  HiveQL script ?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Jérôme VERDIER*
>>>>> 06.72.19.17.31
>>>>> verdier.jerome66@gmail.com
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> https://github.com/bearrito
>>> @deepbearrito
>>>
>>
>>
>>
>> --
>> *Jérôme VERDIER*
>> 06.72.19.17.31
>> verdier.jerome66@gmail.com
>>
>>
>


-- 
*Jérôme VERDIER*
06.72.19.17.31
verdier.jerome66@gmail.com

Mime
View raw message