hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Verdier <verdier.jerom...@gmail.com>
Subject Re: Use RANK OVER PARTITION function in Hive 0.11
Date Thu, 18 Jul 2013 09:34:29 GMT
Hi,

Since we saw that we have to give arguments in RANK() function, i'm trying
to translate this one (working on Oracle 10g) to be functionnally in Hive :

RANK() OVER (PARTITION BY mag.co_magasin, dem.id_produit ORDER BY
pnvente.dt_debut_commercial DESC,
COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang

i try this :

RANK(pnvente.dt_debut_commercial,
COALESCE(pnvente.id_produit,dem.id_produit)) OVER (PARTITION BY
mag.co_magasin, dem.id_produit ORDER BY pnvente.dt_debut_commercial DESC,
COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang

and this :


RANK(pnvente.dt_debut_commercial, pnvente.id_produit, dem.id_produit) OVER
(PARTITION BY mag.co_magasin, dem.id_produit ORDER BY
pnvente.dt_debut_commercial DESC,
COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang

But Hive is giving me another error :

FAILED: SemanticException Failed to breakup Windowing invocations into
Groups. At least 1 group must only depend on input columns. Also check for
circular dependencies.
Underlying error: Ranking Functions can take no arguments

i don't understand this error, in the first try, he said that he can't work
without arguments, and now, rank function is falling because of the
arguments.

what is wrong now ?



2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>

> my bad ... in relational databases we generally do not give a column name
> inside rank() ... but the one in (partition by .... order by..) is
> sufficient.
>
> But looks like that's not the case in Hive
>
>
> Jerome,
>
> Please look at the examples in link below. See if you are able to make it
> work
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-PARTITIONBYwithpartitioning%2CORDERBY%2Candwindowspecification
>
>
>
> Cant help you beyond this as i don't have Hive 0.11 :-(
>
>
> Richa
>
>
> On Wed, Jul 17, 2013 at 3:08 PM, Jérôme Verdier <
> verdier.jerome66@gmail.com> wrote:
>
>> Hi Richa,
>>
>> I have tried one query, with what i've understand of  Vijay's tips.
>>
>> SELECT code_entite, RANK(mag.me_vente_ht) OVER (PARTITION BY
>> mag.co_societe ORDER BY  mag.me_vente_ht) AS rank FROM
>> default.thm_renta_rgrp_produits_n_1 mag;
>>
>> This query is working, it gives me results.
>>
>> You say that maybe i'm hitting the same bug of JIRA HIVE-4663, but query
>> is also failling when i put analytical columns in...
>>
>>
>> 2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>
>>
>>> Vijay
>>>
>>> Jerome has already passed column -> mag.co_societe for rank.
>>>
>>> syntax -> RANK() OVER (PARTITION BY mag.co_societe ORDER BY
>>> mag.me_vente_ht)
>>> This will generate a rank for column mag.co_societe based on column
>>> value me_vente_ht
>>>
>>> Jerome,
>>>
>>> Its possible you are also hitting the same bug as I mentioned in my
>>> email before.
>>>
>>>
>>> Richa
>>>
>>>
>>> On Wed, Jul 17, 2013 at 2:31 PM, Vijay <techvd@gmail.com> wrote:
>>>
>>>> As the error message states: "One ore more arguments are expected," you
>>>> have to pass a column to the rank function.
>>>>
>>>>
>>>> On Wed, Jul 17, 2013 at 1:12 AM, Jérôme Verdier <
>>>> verdier.jerome66@gmail.com> wrote:
>>>>
>>>>> Hi Richa,
>>>>>
>>>>> I have tried a simple query without joins, etc....
>>>>>
>>>>> SELECT RANK() OVER (PARTITION BY mag.co_societe ORDER BY
>>>>> mag.me_vente_ht),mag.co_societe, mag.me_vente_ht FROM
>>>>> default.thm_renta_rgrp_produits_n_1 mag;
>>>>>
>>>>> Unfortunately, the error is the same like previously.
>>>>>
>>>>> Error: Query returned non-zero code: 40000, cause: FAILED:
>>>>> SemanticException Failed to breakup Windowing invocations into Groups.
At
>>>>> least 1 group must only depend on input columns. Also check for circular
>>>>> dependencies.
>>>>>
>>>>> Underlying error:
>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>>>> arguments are expected.
>>>>> SQLState:  42000
>>>>> ErrorCode: 40000
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>
>>>>>> Jerome
>>>>>>
>>>>>> I would recommend that you try Rank function with columns from just
>>>>>> one table first.
>>>>>> Once it is established that rank is working fine then add all the
>>>>>> joins.
>>>>>>
>>>>>> I am still on Hive 0.10 so cannot test it myself.
>>>>>> However, I can find a similar issue on following link - so its
>>>>>> possible you are facing issues due to this reported bug.
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/HIVE-4663
>>>>>>
>>>>>>
>>>>>> Richa
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 16, 2013 at 6:41 PM, Jérôme Verdier <
>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>
>>>>>>> You can see my query below :
>>>>>>>
>>>>>>> SELECT
>>>>>>>             mag.co_magasin,
>>>>>>>             dem.id_produit                                  as
>>>>>>> id_produit_orig,
>>>>>>>             pnvente.dt_debut_commercial                     as
>>>>>>> dt_debut_commercial,
>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)     as
>>>>>>> id_produit,
>>>>>>>             RANK() OVER (PARTITION BY mag.co_magasin, dem.id_produit
>>>>>>> ORDER BY pnvente.dt_debut_commercial DESC,
>>>>>>> COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang
>>>>>>>
>>>>>>> FROM default.demarque_mag_jour dem
>>>>>>>
>>>>>>>           LEFT OUTER JOIN default.produit_norm pn
>>>>>>>           ON  pn.co_societe = dem.co_societe
>>>>>>>           AND pn.id_produit = dem.id_produit
>>>>>>>           LEFT OUTER JOIN default.produit_norm pnvente
>>>>>>>           ON  pnvente.co_societe = pn.co_societe
>>>>>>>           AND pnvente.co_produit_rfu = pn.co_produit_lip
>>>>>>>           INNER JOIN default.kpi_magasin mag
>>>>>>>           ON mag.id_magasin = dem.id_magasin
>>>>>>>
>>>>>>>
>>>>>>> GROUP BY
>>>>>>>             mag.co_magasin,
>>>>>>>             dem.id_produit,
>>>>>>>             pnvente.dt_debut_commercial,
>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit);
>>>>>>>
>>>>>>>
>>>>>>> 2013/7/16 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>>>
>>>>>>>> Can you share query with just RANK().
>>>>>>>>
>>>>>>>> Richa
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 16, 2013 at 6:08 PM, Jérôme Verdier <
>>>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Richa,
>>>>>>>>>
>>>>>>>>> I tried to execute the rank function alone, but the result
is the
>>>>>>>>> same
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/7/16 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>>>>>
>>>>>>>>>> Hi Jerome
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think the problem is you are trying to use MIN,
SUM and RANK
>>>>>>>>>> function in a single query.
>>>>>>>>>>
>>>>>>>>>> Try to get the rank first in a query and on top of
it apply these
>>>>>>>>>> aggregate functions
>>>>>>>>>>
>>>>>>>>>> Richa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 16, 2013 at 2:15 PM, Jérôme Verdier
<
>>>>>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have a problem while using RANK OVER PARTITION
function with
>>>>>>>>>>> Hive.
>>>>>>>>>>>
>>>>>>>>>>> Hive is in version 0.11 and, as we can see here
:
>>>>>>>>>>> https://cwiki.apache.org/Hive/languagemanual-windowingandanalytics.html,
>>>>>>>>>>> we can now use these functions in Hive.
>>>>>>>>>>>
>>>>>>>>>>> But, when i use it, i encountered this error
:
>>>>>>>>>>>
>>>>>>>>>>> FAILED: SemanticException Failed to breakup Windowing
>>>>>>>>>>> invocations into Groups. At least 1 group must
only depend on input
>>>>>>>>>>> columns. Also check for circular dependencies.
>>>>>>>>>>> Underlying error:
>>>>>>>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException:
One or more
>>>>>>>>>>> arguments are expected.
>>>>>>>>>>>
>>>>>>>>>>> Here is my script :
>>>>>>>>>>>
>>>>>>>>>>> SELECT
>>>>>>>>>>>             mag.co_magasin,
>>>>>>>>>>>             dem.id_produit                  
               as
>>>>>>>>>>> id_produit_orig,
>>>>>>>>>>>             pnvente.dt_debut_commercial     
               as
>>>>>>>>>>> dt_debut_commercial,
>>>>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)
    as
>>>>>>>>>>> id_produit,
>>>>>>>>>>>             min(
>>>>>>>>>>>               CASE WHEN dem.co_validation IS
NULL THEN 0 ELSE 1
>>>>>>>>>>> END
>>>>>>>>>>>             )                               
               as
>>>>>>>>>>> flg_demarque_valide,
>>>>>>>>>>>             sum(CASE WHEN dem.co_validation IS
NULL THEN 0 ELSE
>>>>>>>>>>> CAST(dem.mt_revient_ope AS INT) END)
>>>>>>>>>>>                                             
               as
>>>>>>>>>>> me_dem_con_prx_cs,
>>>>>>>>>>>             0                               
               as
>>>>>>>>>>> me_dem_inc_prx_cs,
>>>>>>>>>>>             0                               
               as
>>>>>>>>>>> me_dem_prov_stk_cs,
>>>>>>>>>>>             sum(CASE WHEN dem.co_validation IS
NULL THEN 0 ELSE
>>>>>>>>>>> CAST(dem.qt_demarque AS INT) END)
>>>>>>>>>>>                                             
               as
>>>>>>>>>>> qt_dem_con,
>>>>>>>>>>>             0                               
               as
>>>>>>>>>>> qt_dem_inc,
>>>>>>>>>>>             0                               
               as
>>>>>>>>>>> qt_dem_prov_stk, -- !!!!!!!! VIRGULE
>>>>>>>>>>>             RANK() OVER (PARTITION BY mag.co_magasin,
>>>>>>>>>>> dem.id_produit ORDER BY pnvente.dt_debut_commercial
DESC,
>>>>>>>>>>> COALESCE(pnvente.id_produit,dem.id_produit) DESC)
as rang
>>>>>>>>>>>           from default.calendrier cal
>>>>>>>>>>>           INNER JOIN default.demarque_mag_jour
dem
>>>>>>>>>>>           ON  CASE WHEN dem.co_societe = 1 THEN
1 ELSE 2 END =
>>>>>>>>>>> '${hiveconf:in_co_societe}' -- A modifier
>>>>>>>>>>>           AND dem.dt_jour    = cal.dt_jour
>>>>>>>>>>>           LEFT OUTER JOIN default.produit_norm
pn
>>>>>>>>>>>           ON  pn.co_societe = dem.co_societe
>>>>>>>>>>>           AND pn.id_produit = dem.id_produit
>>>>>>>>>>>           LEFT OUTER JOIN default.produit_norm
pnvente
>>>>>>>>>>>           ON  pnvente.co_societe = pn.co_societe
>>>>>>>>>>>           AND pnvente.co_produit_rfu = pn.co_produit_lip
>>>>>>>>>>>           AND pnvente.co_type_motif='05'
>>>>>>>>>>>           INNER JOIN default.kpi_magasin mag
>>>>>>>>>>>           ON  mag.co_societe = '${hiveconf:in_co_societe}'
>>>>>>>>>>>           AND mag.id_magasin = dem.id_magasin
>>>>>>>>>>>           WHERE cal.dt_jour = '${hiveconf:in_dt_jour}'
>>>>>>>>>>>           AND NOT (dem.co_validation IS NULL
AND cal.dt_jour >
>>>>>>>>>>> unix_timestamp()-3*60*60*24) -- A verifier
>>>>>>>>>>>           -- JYP 4.4
>>>>>>>>>>>           AND dem.co_operation_magasin IN ('13','14','32')
>>>>>>>>>>>           GROUP BY
>>>>>>>>>>>             mag.co_magasin,
>>>>>>>>>>>             dem.id_produit,
>>>>>>>>>>>             pnvente.dt_debut_commercial,
>>>>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Jérôme VERDIER*
>>>>>>>>>>> 06.72.19.17.31
>>>>>>>>>>> verdier.jerome66@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Jérôme VERDIER*
>>>>>>>>> 06.72.19.17.31
>>>>>>>>> verdier.jerome66@gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Jérôme VERDIER*
>>>>>>> 06.72.19.17.31
>>>>>>> verdier.jerome66@gmail.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Jérôme VERDIER*
>>>>> 06.72.19.17.31
>>>>> verdier.jerome66@gmail.com
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> *Jérôme VERDIER*
>> 06.72.19.17.31
>> verdier.jerome66@gmail.com
>>
>>
>


-- 
*Jérôme VERDIER*
06.72.19.17.31
verdier.jerome66@gmail.com

Mime
View raw message