hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richa Sharma <mailtorichasha...@gmail.com>
Subject Re: Use RANK OVER PARTITION function in Hive 0.11
Date Wed, 17 Jul 2013 12:45:09 GMT
my bad ... in relational databases we generally do not give a column name
inside rank() ... but the one in (partition by .... order by..) is
sufficient.

But looks like that's not the case in Hive


Jerome,

Please look at the examples in link below. See if you are able to make it
work

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-PARTITIONBYwithpartitioning%2CORDERBY%2Candwindowspecification



Cant help you beyond this as i don't have Hive 0.11 :-(


Richa


On Wed, Jul 17, 2013 at 3:08 PM, Jérôme Verdier
<verdier.jerome66@gmail.com>wrote:

> Hi Richa,
>
> I have tried one query, with what i've understand of  Vijay's tips.
>
> SELECT code_entite, RANK(mag.me_vente_ht) OVER (PARTITION BY
> mag.co_societe ORDER BY  mag.me_vente_ht) AS rank FROM
> default.thm_renta_rgrp_produits_n_1 mag;
>
> This query is working, it gives me results.
>
> You say that maybe i'm hitting the same bug of JIRA HIVE-4663, but query
> is also failling when i put analytical columns in...
>
>
> 2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>
>
>> Vijay
>>
>> Jerome has already passed column -> mag.co_societe for rank.
>>
>> syntax -> RANK() OVER (PARTITION BY mag.co_societe ORDER BY
>> mag.me_vente_ht)
>> This will generate a rank for column mag.co_societe based on column value
>> me_vente_ht
>>
>> Jerome,
>>
>> Its possible you are also hitting the same bug as I mentioned in my email
>> before.
>>
>>
>> Richa
>>
>>
>> On Wed, Jul 17, 2013 at 2:31 PM, Vijay <techvd@gmail.com> wrote:
>>
>>> As the error message states: "One ore more arguments are expected," you
>>> have to pass a column to the rank function.
>>>
>>>
>>> On Wed, Jul 17, 2013 at 1:12 AM, Jérôme Verdier <
>>> verdier.jerome66@gmail.com> wrote:
>>>
>>>> Hi Richa,
>>>>
>>>> I have tried a simple query without joins, etc....
>>>>
>>>> SELECT RANK() OVER (PARTITION BY mag.co_societe ORDER BY
>>>> mag.me_vente_ht),mag.co_societe, mag.me_vente_ht FROM
>>>> default.thm_renta_rgrp_produits_n_1 mag;
>>>>
>>>> Unfortunately, the error is the same like previously.
>>>>
>>>> Error: Query returned non-zero code: 40000, cause: FAILED:
>>>> SemanticException Failed to breakup Windowing invocations into Groups. At
>>>> least 1 group must only depend on input columns. Also check for circular
>>>> dependencies.
>>>>
>>>> Underlying error:
>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>>> arguments are expected.
>>>> SQLState:  42000
>>>> ErrorCode: 40000
>>>>
>>>>
>>>>
>>>>
>>>> 2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>
>>>>
>>>>> Jerome
>>>>>
>>>>> I would recommend that you try Rank function with columns from just
>>>>> one table first.
>>>>> Once it is established that rank is working fine then add all the
>>>>> joins.
>>>>>
>>>>> I am still on Hive 0.10 so cannot test it myself.
>>>>> However, I can find a similar issue on following link - so its
>>>>> possible you are facing issues due to this reported bug.
>>>>>
>>>>> https://issues.apache.org/jira/browse/HIVE-4663
>>>>>
>>>>>
>>>>> Richa
>>>>>
>>>>>
>>>>> On Tue, Jul 16, 2013 at 6:41 PM, Jérôme Verdier <
>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>
>>>>>> You can see my query below :
>>>>>>
>>>>>> SELECT
>>>>>>             mag.co_magasin,
>>>>>>             dem.id_produit                                  as
>>>>>> id_produit_orig,
>>>>>>             pnvente.dt_debut_commercial                     as
>>>>>> dt_debut_commercial,
>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)     as
>>>>>> id_produit,
>>>>>>             RANK() OVER (PARTITION BY mag.co_magasin, dem.id_produit
>>>>>> ORDER BY pnvente.dt_debut_commercial DESC,
>>>>>> COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang
>>>>>>
>>>>>> FROM default.demarque_mag_jour dem
>>>>>>
>>>>>>           LEFT OUTER JOIN default.produit_norm pn
>>>>>>           ON  pn.co_societe = dem.co_societe
>>>>>>           AND pn.id_produit = dem.id_produit
>>>>>>           LEFT OUTER JOIN default.produit_norm pnvente
>>>>>>           ON  pnvente.co_societe = pn.co_societe
>>>>>>           AND pnvente.co_produit_rfu = pn.co_produit_lip
>>>>>>           INNER JOIN default.kpi_magasin mag
>>>>>>           ON mag.id_magasin = dem.id_magasin
>>>>>>
>>>>>>
>>>>>> GROUP BY
>>>>>>             mag.co_magasin,
>>>>>>             dem.id_produit,
>>>>>>             pnvente.dt_debut_commercial,
>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit);
>>>>>>
>>>>>>
>>>>>> 2013/7/16 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>>
>>>>>>> Can you share query with just RANK().
>>>>>>>
>>>>>>> Richa
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 16, 2013 at 6:08 PM, Jérôme Verdier <
>>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Richa,
>>>>>>>>
>>>>>>>> I tried to execute the rank function alone, but the result
is the
>>>>>>>> same
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/7/16 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>>>>
>>>>>>>>> Hi Jerome
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think the problem is you are trying to use MIN, SUM
and RANK
>>>>>>>>> function in a single query.
>>>>>>>>>
>>>>>>>>> Try to get the rank first in a query and on top of it
apply these
>>>>>>>>> aggregate functions
>>>>>>>>>
>>>>>>>>> Richa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 16, 2013 at 2:15 PM, Jérôme Verdier <
>>>>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have a problem while using RANK OVER PARTITION
function with
>>>>>>>>>> Hive.
>>>>>>>>>>
>>>>>>>>>> Hive is in version 0.11 and, as we can see here :
>>>>>>>>>> https://cwiki.apache.org/Hive/languagemanual-windowingandanalytics.html,
>>>>>>>>>> we can now use these functions in Hive.
>>>>>>>>>>
>>>>>>>>>> But, when i use it, i encountered this error :
>>>>>>>>>>
>>>>>>>>>> FAILED: SemanticException Failed to breakup Windowing
invocations
>>>>>>>>>> into Groups. At least 1 group must only depend on
input columns. Also check
>>>>>>>>>> for circular dependencies.
>>>>>>>>>> Underlying error:
>>>>>>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException:
One or more
>>>>>>>>>> arguments are expected.
>>>>>>>>>>
>>>>>>>>>> Here is my script :
>>>>>>>>>>
>>>>>>>>>> SELECT
>>>>>>>>>>             mag.co_magasin,
>>>>>>>>>>             dem.id_produit                      
           as
>>>>>>>>>> id_produit_orig,
>>>>>>>>>>             pnvente.dt_debut_commercial         
           as
>>>>>>>>>> dt_debut_commercial,
>>>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)
    as
>>>>>>>>>> id_produit,
>>>>>>>>>>             min(
>>>>>>>>>>               CASE WHEN dem.co_validation IS NULL
THEN 0 ELSE 1
>>>>>>>>>> END
>>>>>>>>>>             )                                   
           as
>>>>>>>>>> flg_demarque_valide,
>>>>>>>>>>             sum(CASE WHEN dem.co_validation IS NULL
THEN 0 ELSE
>>>>>>>>>> CAST(dem.mt_revient_ope AS INT) END)
>>>>>>>>>>                                                 
           as
>>>>>>>>>> me_dem_con_prx_cs,
>>>>>>>>>>             0                                   
           as
>>>>>>>>>> me_dem_inc_prx_cs,
>>>>>>>>>>             0                                   
           as
>>>>>>>>>> me_dem_prov_stk_cs,
>>>>>>>>>>             sum(CASE WHEN dem.co_validation IS NULL
THEN 0 ELSE
>>>>>>>>>> CAST(dem.qt_demarque AS INT) END)
>>>>>>>>>>                                                 
           as
>>>>>>>>>> qt_dem_con,
>>>>>>>>>>             0                                   
           as
>>>>>>>>>> qt_dem_inc,
>>>>>>>>>>             0                                   
           as
>>>>>>>>>> qt_dem_prov_stk, -- !!!!!!!! VIRGULE
>>>>>>>>>>             RANK() OVER (PARTITION BY mag.co_magasin,
>>>>>>>>>> dem.id_produit ORDER BY pnvente.dt_debut_commercial
DESC,
>>>>>>>>>> COALESCE(pnvente.id_produit,dem.id_produit) DESC)
as rang
>>>>>>>>>>           from default.calendrier cal
>>>>>>>>>>           INNER JOIN default.demarque_mag_jour dem
>>>>>>>>>>           ON  CASE WHEN dem.co_societe = 1 THEN 1
ELSE 2 END =
>>>>>>>>>> '${hiveconf:in_co_societe}' -- A modifier
>>>>>>>>>>           AND dem.dt_jour    = cal.dt_jour
>>>>>>>>>>           LEFT OUTER JOIN default.produit_norm pn
>>>>>>>>>>           ON  pn.co_societe = dem.co_societe
>>>>>>>>>>           AND pn.id_produit = dem.id_produit
>>>>>>>>>>           LEFT OUTER JOIN default.produit_norm pnvente
>>>>>>>>>>           ON  pnvente.co_societe = pn.co_societe
>>>>>>>>>>           AND pnvente.co_produit_rfu = pn.co_produit_lip
>>>>>>>>>>           AND pnvente.co_type_motif='05'
>>>>>>>>>>           INNER JOIN default.kpi_magasin mag
>>>>>>>>>>           ON  mag.co_societe = '${hiveconf:in_co_societe}'
>>>>>>>>>>           AND mag.id_magasin = dem.id_magasin
>>>>>>>>>>           WHERE cal.dt_jour = '${hiveconf:in_dt_jour}'
>>>>>>>>>>           AND NOT (dem.co_validation IS NULL AND
cal.dt_jour >
>>>>>>>>>> unix_timestamp()-3*60*60*24) -- A verifier
>>>>>>>>>>           -- JYP 4.4
>>>>>>>>>>           AND dem.co_operation_magasin IN ('13','14','32')
>>>>>>>>>>           GROUP BY
>>>>>>>>>>             mag.co_magasin,
>>>>>>>>>>             dem.id_produit,
>>>>>>>>>>             pnvente.dt_debut_commercial,
>>>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Jérôme VERDIER*
>>>>>>>>>> 06.72.19.17.31
>>>>>>>>>> verdier.jerome66@gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Jérôme VERDIER*
>>>>>>>> 06.72.19.17.31
>>>>>>>> verdier.jerome66@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Jérôme VERDIER*
>>>>>> 06.72.19.17.31
>>>>>> verdier.jerome66@gmail.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Jérôme VERDIER*
>>>> 06.72.19.17.31
>>>> verdier.jerome66@gmail.com
>>>>
>>>>
>>>
>>
>
>
> --
> *Jérôme VERDIER*
> 06.72.19.17.31
> verdier.jerome66@gmail.com
>
>

Mime
View raw message