hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Verdier <verdier.jerom...@gmail.com>
Subject Re: Use RANK OVER PARTITION function in Hive 0.11
Date Wed, 17 Jul 2013 09:38:19 GMT
Hi Richa,

I have tried one query, with what i've understand of  Vijay's tips.

SELECT code_entite, RANK(mag.me_vente_ht) OVER (PARTITION BY mag.co_societe
ORDER BY  mag.me_vente_ht) AS rank FROM default.thm_renta_rgrp_produits_n_1
mag;

This query is working, it gives me results.

You say that maybe i'm hitting the same bug of JIRA HIVE-4663, but query is
also failling when i put analytical columns in...


2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>

> Vijay
>
> Jerome has already passed column -> mag.co_societe for rank.
>
> syntax -> RANK() OVER (PARTITION BY mag.co_societe ORDER BY
> mag.me_vente_ht)
> This will generate a rank for column mag.co_societe based on column value
> me_vente_ht
>
> Jerome,
>
> Its possible you are also hitting the same bug as I mentioned in my email
> before.
>
>
> Richa
>
>
> On Wed, Jul 17, 2013 at 2:31 PM, Vijay <techvd@gmail.com> wrote:
>
>> As the error message states: "One ore more arguments are expected," you
>> have to pass a column to the rank function.
>>
>>
>> On Wed, Jul 17, 2013 at 1:12 AM, Jérôme Verdier <
>> verdier.jerome66@gmail.com> wrote:
>>
>>> Hi Richa,
>>>
>>> I have tried a simple query without joins, etc....
>>>
>>> SELECT RANK() OVER (PARTITION BY mag.co_societe ORDER BY
>>> mag.me_vente_ht),mag.co_societe, mag.me_vente_ht FROM
>>> default.thm_renta_rgrp_produits_n_1 mag;
>>>
>>> Unfortunately, the error is the same like previously.
>>>
>>> Error: Query returned non-zero code: 40000, cause: FAILED:
>>> SemanticException Failed to breakup Windowing invocations into Groups. At
>>> least 1 group must only depend on input columns. Also check for circular
>>> dependencies.
>>>
>>> Underlying error:
>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>> arguments are expected.
>>> SQLState:  42000
>>> ErrorCode: 40000
>>>
>>>
>>>
>>>
>>> 2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>
>>>
>>>> Jerome
>>>>
>>>> I would recommend that you try Rank function with columns from just one
>>>> table first.
>>>> Once it is established that rank is working fine then add all the joins.
>>>>
>>>> I am still on Hive 0.10 so cannot test it myself.
>>>> However, I can find a similar issue on following link - so its possible
>>>> you are facing issues due to this reported bug.
>>>>
>>>> https://issues.apache.org/jira/browse/HIVE-4663
>>>>
>>>>
>>>> Richa
>>>>
>>>>
>>>> On Tue, Jul 16, 2013 at 6:41 PM, Jérôme Verdier <
>>>> verdier.jerome66@gmail.com> wrote:
>>>>
>>>>> You can see my query below :
>>>>>
>>>>> SELECT
>>>>>             mag.co_magasin,
>>>>>             dem.id_produit                                  as
>>>>> id_produit_orig,
>>>>>             pnvente.dt_debut_commercial                     as
>>>>> dt_debut_commercial,
>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)     as
>>>>> id_produit,
>>>>>             RANK() OVER (PARTITION BY mag.co_magasin, dem.id_produit
>>>>> ORDER BY pnvente.dt_debut_commercial DESC,
>>>>> COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang
>>>>>
>>>>> FROM default.demarque_mag_jour dem
>>>>>
>>>>>           LEFT OUTER JOIN default.produit_norm pn
>>>>>           ON  pn.co_societe = dem.co_societe
>>>>>           AND pn.id_produit = dem.id_produit
>>>>>           LEFT OUTER JOIN default.produit_norm pnvente
>>>>>           ON  pnvente.co_societe = pn.co_societe
>>>>>           AND pnvente.co_produit_rfu = pn.co_produit_lip
>>>>>           INNER JOIN default.kpi_magasin mag
>>>>>           ON mag.id_magasin = dem.id_magasin
>>>>>
>>>>>
>>>>> GROUP BY
>>>>>             mag.co_magasin,
>>>>>             dem.id_produit,
>>>>>             pnvente.dt_debut_commercial,
>>>>>             COALESCE(pnvente.id_produit,dem.id_produit);
>>>>>
>>>>>
>>>>> 2013/7/16 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>
>>>>>> Can you share query with just RANK().
>>>>>>
>>>>>> Richa
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 16, 2013 at 6:08 PM, Jérôme Verdier <
>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Richa,
>>>>>>>
>>>>>>> I tried to execute the rank function alone, but the result is
the
>>>>>>> same
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> 2013/7/16 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>>>
>>>>>>>> Hi Jerome
>>>>>>>>
>>>>>>>>
>>>>>>>> I think the problem is you are trying to use MIN, SUM and
RANK
>>>>>>>> function in a single query.
>>>>>>>>
>>>>>>>> Try to get the rank first in a query and on top of it apply
these
>>>>>>>> aggregate functions
>>>>>>>>
>>>>>>>> Richa
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 16, 2013 at 2:15 PM, Jérôme Verdier <
>>>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have a problem while using RANK OVER PARTITION function
with
>>>>>>>>> Hive.
>>>>>>>>>
>>>>>>>>> Hive is in version 0.11 and, as we can see here :
>>>>>>>>> https://cwiki.apache.org/Hive/languagemanual-windowingandanalytics.html,
>>>>>>>>> we can now use these functions in Hive.
>>>>>>>>>
>>>>>>>>> But, when i use it, i encountered this error :
>>>>>>>>>
>>>>>>>>> FAILED: SemanticException Failed to breakup Windowing
invocations
>>>>>>>>> into Groups. At least 1 group must only depend on input
columns. Also check
>>>>>>>>> for circular dependencies.
>>>>>>>>> Underlying error:
>>>>>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException:
One or more
>>>>>>>>> arguments are expected.
>>>>>>>>>
>>>>>>>>> Here is my script :
>>>>>>>>>
>>>>>>>>> SELECT
>>>>>>>>>             mag.co_magasin,
>>>>>>>>>             dem.id_produit                          
       as
>>>>>>>>> id_produit_orig,
>>>>>>>>>             pnvente.dt_debut_commercial             
       as
>>>>>>>>> dt_debut_commercial,
>>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)
    as
>>>>>>>>> id_produit,
>>>>>>>>>             min(
>>>>>>>>>               CASE WHEN dem.co_validation IS NULL THEN
0 ELSE 1 END
>>>>>>>>>             )                                       
       as
>>>>>>>>> flg_demarque_valide,
>>>>>>>>>             sum(CASE WHEN dem.co_validation IS NULL THEN
0 ELSE
>>>>>>>>> CAST(dem.mt_revient_ope AS INT) END)
>>>>>>>>>                                                     
       as
>>>>>>>>> me_dem_con_prx_cs,
>>>>>>>>>             0                                       
       as
>>>>>>>>> me_dem_inc_prx_cs,
>>>>>>>>>             0                                       
       as
>>>>>>>>> me_dem_prov_stk_cs,
>>>>>>>>>             sum(CASE WHEN dem.co_validation IS NULL THEN
0 ELSE
>>>>>>>>> CAST(dem.qt_demarque AS INT) END)
>>>>>>>>>                                                     
       as
>>>>>>>>> qt_dem_con,
>>>>>>>>>             0                                       
       as
>>>>>>>>> qt_dem_inc,
>>>>>>>>>             0                                       
       as
>>>>>>>>> qt_dem_prov_stk, -- !!!!!!!! VIRGULE
>>>>>>>>>             RANK() OVER (PARTITION BY mag.co_magasin,
>>>>>>>>> dem.id_produit ORDER BY pnvente.dt_debut_commercial DESC,
>>>>>>>>> COALESCE(pnvente.id_produit,dem.id_produit) DESC) as
rang
>>>>>>>>>           from default.calendrier cal
>>>>>>>>>           INNER JOIN default.demarque_mag_jour dem
>>>>>>>>>           ON  CASE WHEN dem.co_societe = 1 THEN 1 ELSE
2 END =
>>>>>>>>> '${hiveconf:in_co_societe}' -- A modifier
>>>>>>>>>           AND dem.dt_jour    = cal.dt_jour
>>>>>>>>>           LEFT OUTER JOIN default.produit_norm pn
>>>>>>>>>           ON  pn.co_societe = dem.co_societe
>>>>>>>>>           AND pn.id_produit = dem.id_produit
>>>>>>>>>           LEFT OUTER JOIN default.produit_norm pnvente
>>>>>>>>>           ON  pnvente.co_societe = pn.co_societe
>>>>>>>>>           AND pnvente.co_produit_rfu = pn.co_produit_lip
>>>>>>>>>           AND pnvente.co_type_motif='05'
>>>>>>>>>           INNER JOIN default.kpi_magasin mag
>>>>>>>>>           ON  mag.co_societe = '${hiveconf:in_co_societe}'
>>>>>>>>>           AND mag.id_magasin = dem.id_magasin
>>>>>>>>>           WHERE cal.dt_jour = '${hiveconf:in_dt_jour}'
>>>>>>>>>           AND NOT (dem.co_validation IS NULL AND cal.dt_jour
>
>>>>>>>>> unix_timestamp()-3*60*60*24) -- A verifier
>>>>>>>>>           -- JYP 4.4
>>>>>>>>>           AND dem.co_operation_magasin IN ('13','14','32')
>>>>>>>>>           GROUP BY
>>>>>>>>>             mag.co_magasin,
>>>>>>>>>             dem.id_produit,
>>>>>>>>>             pnvente.dt_debut_commercial,
>>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Jérôme VERDIER*
>>>>>>>>> 06.72.19.17.31
>>>>>>>>> verdier.jerome66@gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Jérôme VERDIER*
>>>>>>> 06.72.19.17.31
>>>>>>> verdier.jerome66@gmail.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Jérôme VERDIER*
>>>>> 06.72.19.17.31
>>>>> verdier.jerome66@gmail.com
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Jérôme VERDIER*
>>> 06.72.19.17.31
>>> verdier.jerome66@gmail.com
>>>
>>>
>>
>


-- 
*Jérôme VERDIER*
06.72.19.17.31
verdier.jerome66@gmail.com

Mime
View raw message