hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brendan Heussler <bheuss...@gmail.com>
Subject Re: PL/SQL to HiveQL translation
Date Mon, 29 Jul 2013 12:52:47 GMT
Jerome,

There is a really good page on the wiki:
https://cwiki.apache.org/Hive/hiveserver2-clients.html

I use the HiveServer2 JDBC driver.  Maybe there are other ways?



Brendan


On Mon, Jul 29, 2013 at 5:47 AM, Jérôme Verdier
<verdier.jerome66@gmail.com>wrote:

> Hi,
>
> Thanks everyone for your help.
>
> Has anyone have a good tutorial to run Hive queries and scripts with Java
> (over Eclipse). I have some Java Development basis but i'm  pretty new
> using Hive with Java/Eclipse.
>
> Thanks.
>
>
> 2013/7/25 j.barrett Strausser <j.barrett.strausser@gmail.com>
>
>> The advice I have always seen for your case is to transform the subquery
>> in the WHERE clause into a LEFT OUTER JOIN.
>>
>>
>>
>>
>> On Thu, Jul 25, 2013 at 11:04 AM, Edson Ramiro <erlfilho@gmail.com>wrote:
>>
>>> AFAIK,
>>>
>>> Hive supports subqueries only in the FROM clause.
>>>
>>> Maybe you have to split you query into more queries...
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
>>>
>>>
>>>
>>>
>>>        Edson Ramiro
>>>
>>>
>>> On Thu, Jul 25, 2013 at 9:31 AM, Jérôme Verdier <
>>> verdier.jerome66@gmail.com> wrote:
>>>
>>>> Hi Bennie,
>>>>
>>>> I was trying some solutions to pass through my problem, and a problem
>>>> occurs
>>>>
>>>> here is the error :
>>>>
>>>> FAILED: ParseException line 26:14 cannot recognize input near 'SELECT'
>>>> 'cal' '.' in expression specification
>>>>
>>>> Is AND...BETWEEN ( SELECT..... is possible in Hive?
>>>>
>>>>
>>>> 2013/7/25 Bennie Schut <bschut@ebuddy.com>
>>>>
>>>>>  Hi Jerome,
>>>>>
>>>>> Yes it looks like you could stop using GET_SEMAINE  and directly
>>>>> joining "calendrier_hebdo" with "calendrier" for example. For
>>>>> "FCALC_IDJOUR" you will have to make a udf so I hope you have some java
>>>>> skills :)
>>>>> The "calendrier" tables suggests you have star schema with a calendar
>>>>> table. If on oracle you partitioned on a date and use a subquery to get
the
>>>>> dates you want from the fact table you can expect this to be a problem
in
>>>>> hive. Partition pruning works during planning it will not know which
>>>>> partitioned to prune and thus run on all the data in the fact table and
>>>>> filter after it's done making partitioning useless. There are ways of
>>>>> working around this, it seems most people decide to use a "deterministic"
>>>>> udf which produces the dates and this causes the udfs to be run during
>>>>> planning and partition pruning magically works again.
>>>>> Hope this helps.
>>>>>
>>>>> Bennie.
>>>>>
>>>>> Op 25-7-2013 09:50, Jérôme Verdier schreef:
>>>>>
>>>>>    Hi,
>>>>>
>>>>>  I need some help to translate a PL/SQL script in HiveQL.
>>>>>
>>>>>  Problem : my PL/SQL script is calling two functions.
>>>>>
>>>>>  you can see the script below :
>>>>>
>>>>>     SELECT
>>>>>       in_co_societe                             as co_societe,
>>>>>       'SEMAINE'                                 as co_type_periode,
>>>>>       a.type_entite                             as type_entite,
>>>>>       a.code_entite                             as code_entite,
>>>>>       a.type_rgrp_produits                      as type_rgrp_produits,
>>>>>       a.co_rgrp_produits                        as co_rgrp_produits,
>>>>>       SUM(a.MT_CA_NET_TTC)                      as MT_CA_NET_TTC,
>>>>>       SUM(a.MT_OBJ_CA_NET_TTC)                  as MT_OBJ_CA_NET_TTC,
>>>>>       SUM(a.NB_CLIENTS)                         as NB_CLIENTS,
>>>>>       SUM(a.MT_CA_NET_TTC_COMP)                 as MT_CA_NET_TTC_COMP,
>>>>>       SUM(a.MT_OBJ_CA_NET_TTC_COMP)             as
>>>>> MT_OBJ_CA_NET_TTC_COMP,
>>>>>       SUM(a.NB_CLIENTS_COMP)                    as NB_CLIENTS_COMP
>>>>>     from
>>>>>       kpi.thm_ca_rgrp_produits_jour/*@o_bi.match.eu*/ a
>>>>>     WHERE
>>>>>         a.co_societe = in_co_societe
>>>>>     AND a.dt_jour between
>>>>>       (
>>>>>         SELECT
>>>>>           cal.dt_jour_deb
>>>>>         FROM ods.calendrier_hebdo cal
>>>>>         WHERE cal.co_societe = in_co_societe
>>>>>         AND cal.co_an_semaine = ods.package_date.get_semaine(
>>>>>           ods.package_date.fcalc_idjour(
>>>>>             CASE
>>>>>               WHEN TO_CHAR(D_Dernier_Jour,'YYYY') =
>>>>> TO_CHAR(D_Dernier_Jour-364,'YYYY') THEN
>>>>>                 NEXT_DAY(D_Dernier_Jour-364,1)-7
>>>>>               ELSE
>>>>>                 D_Dernier_Jour-364
>>>>>             END
>>>>>           )
>>>>>         )
>>>>>       )
>>>>>       AND D_Dernier_Jour-364
>>>>>     -- On ne calcule rien si la semaine est compl¿¿te
>>>>>     AND (
>>>>>           TO_CHAR(D_Dernier_Jour,'DDMM') <> '3112'
>>>>>       AND TO_CHAR(D_Dernier_Jour,'D') <> '7'
>>>>>     )
>>>>>     GROUP BY
>>>>>       a.type_entite,
>>>>>       a.code_entite,
>>>>>       a.type_rgrp_produits,
>>>>>       a.co_rgrp_produits;
>>>>>
>>>>>  The function ods.package_date.get_semaine is :
>>>>>
>>>>> FUNCTION GET_SEMAINE
>>>>>        (ID_DEB  IN NUMBER)
>>>>>   RETURN NUMBER
>>>>>   IS
>>>>>     SEMAINE  NUMBER(10);
>>>>>   BEGIN
>>>>>     SELECT CO_AN_SEMAINE
>>>>>     INTO   SEMAINE
>>>>>     FROM   CALENDRIER
>>>>>     WHERE  ID_JOUR = ID_DEB;
>>>>>
>>>>>     RETURN (SEMAINE);
>>>>>   EXCEPTION
>>>>>     WHEN NO_DATA_FOUND THEN
>>>>>       RETURN (0);
>>>>>     WHEN OTHERS THEN
>>>>>       RETURN (0);
>>>>>   END;
>>>>>
>>>>>  The function ods.package_date.fcalc_idjour is below :
>>>>>
>>>>> FUNCTION FCALC_IDJOUR
>>>>>        (DATE_REFERENCE  IN DATE)
>>>>>   RETURN NUMBER
>>>>>   IS
>>>>>     NM_ANNEE        NUMBER :=
>>>>> TO_NUMBER(TO_CHAR(DATE_REFERENCE,'YYYY'));
>>>>>     NM_MOIS         NUMBER :=
>>>>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMM'),5,2));
>>>>>     NM_JOUR         NUMBER :=
>>>>> TO_NUMBER(SUBSTR(TO_CHAR(DATE_REFERENCE,'YYYYMMDD'),7,2));
>>>>>     IDJOUR_CALCULE  NUMBER := 0;
>>>>>   BEGIN
>>>>>     IF NM_ANNEE < 1998
>>>>>         OR DATE_REFERENCE IS NULL THEN
>>>>>       IDJOUR_CALCULE := 0;
>>>>>     ELSE
>>>>>       IDJOUR_CALCULE := ((NM_ANNEE - 1998) * 600) + ((NM_MOIS - 01) *
>>>>> 50) + NM_JOUR;
>>>>>     END IF;
>>>>>
>>>>>     RETURN IDJOUR_CALCULE;
>>>>>     DBMS_OUTPUT.PUT_LINE(IDJOUR_CALCULE);
>>>>>   END FCALC_IDJOUR;
>>>>>
>>>>>  Is it possible to translate this in one  HiveQL script ?
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Jérôme VERDIER*
>>>> 06.72.19.17.31
>>>> verdier.jerome66@gmail.com
>>>>
>>>>
>>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>
>
> --
> *Jérôme VERDIER*
> 06.72.19.17.31
> verdier.jerome66@gmail.com
>
>

Mime
View raw message