hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Verdier <verdier.jerom...@gmail.com>
Subject Re: Use RANK OVER PARTITION function in Hive 0.11
Date Fri, 19 Jul 2013 07:50:29 GMT
Hi,

The error is resolved.

I have 3 RANK() function in my script.

2 are working with arguments, but one is working without arguments.

I don't understand it, but the fact is it's working =)

Thanks everyone for your help.


2013/7/18 Jérôme Verdier <verdier.jerome66@gmail.com>

> Hi,
>
> Since we saw that we have to give arguments in RANK() function, i'm trying
> to translate this one (working on Oracle 10g) to be functionnally in Hive :
>
>
> RANK() OVER (PARTITION BY mag.co_magasin, dem.id_produit ORDER BY
> pnvente.dt_debut_commercial DESC,
> COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang
>
> i try this :
>
> RANK(pnvente.dt_debut_commercial,
> COALESCE(pnvente.id_produit,dem.id_produit)) OVER (PARTITION BY
> mag.co_magasin, dem.id_produit ORDER BY pnvente.dt_debut_commercial DESC,
> COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang
>
> and this :
>
>
> RANK(pnvente.dt_debut_commercial, pnvente.id_produit, dem.id_produit)
> OVER (PARTITION BY mag.co_magasin, dem.id_produit ORDER BY
> pnvente.dt_debut_commercial DESC,
> COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang
>
> But Hive is giving me another error :
>
> FAILED: SemanticException Failed to breakup Windowing invocations into
> Groups. At least 1 group must only depend on input columns. Also check for
> circular dependencies.
> Underlying error: Ranking Functions can take no arguments
>
> i don't understand this error, in the first try, he said that he can't
> work without arguments, and now, rank function is falling because of the
> arguments.
>
> what is wrong now ?
>
>
>
> 2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>
>
>> my bad ... in relational databases we generally do not give a column name
>> inside rank() ... but the one in (partition by .... order by..) is
>> sufficient.
>>
>> But looks like that's not the case in Hive
>>
>>
>> Jerome,
>>
>> Please look at the examples in link below. See if you are able to make it
>> work
>>
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-PARTITIONBYwithpartitioning%2CORDERBY%2Candwindowspecification
>>
>>
>>
>> Cant help you beyond this as i don't have Hive 0.11 :-(
>>
>>
>> Richa
>>
>>
>> On Wed, Jul 17, 2013 at 3:08 PM, Jérôme Verdier <
>> verdier.jerome66@gmail.com> wrote:
>>
>>> Hi Richa,
>>>
>>> I have tried one query, with what i've understand of  Vijay's tips.
>>>
>>> SELECT code_entite, RANK(mag.me_vente_ht) OVER (PARTITION BY
>>> mag.co_societe ORDER BY  mag.me_vente_ht) AS rank FROM
>>> default.thm_renta_rgrp_produits_n_1 mag;
>>>
>>> This query is working, it gives me results.
>>>
>>> You say that maybe i'm hitting the same bug of JIRA HIVE-4663, but query
>>> is also failling when i put analytical columns in...
>>>
>>>
>>> 2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>
>>>
>>>> Vijay
>>>>
>>>> Jerome has already passed column -> mag.co_societe for rank.
>>>>
>>>> syntax -> RANK() OVER (PARTITION BY mag.co_societe ORDER BY
>>>> mag.me_vente_ht)
>>>> This will generate a rank for column mag.co_societe based on column
>>>> value me_vente_ht
>>>>
>>>> Jerome,
>>>>
>>>> Its possible you are also hitting the same bug as I mentioned in my
>>>> email before.
>>>>
>>>>
>>>> Richa
>>>>
>>>>
>>>> On Wed, Jul 17, 2013 at 2:31 PM, Vijay <techvd@gmail.com> wrote:
>>>>
>>>>> As the error message states: "One ore more arguments are expected,"
>>>>> you have to pass a column to the rank function.
>>>>>
>>>>>
>>>>> On Wed, Jul 17, 2013 at 1:12 AM, Jérôme Verdier <
>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>
>>>>>> Hi Richa,
>>>>>>
>>>>>> I have tried a simple query without joins, etc....
>>>>>>
>>>>>> SELECT RANK() OVER (PARTITION BY mag.co_societe ORDER BY
>>>>>> mag.me_vente_ht),mag.co_societe, mag.me_vente_ht FROM
>>>>>> default.thm_renta_rgrp_produits_n_1 mag;
>>>>>>
>>>>>> Unfortunately, the error is the same like previously.
>>>>>>
>>>>>> Error: Query returned non-zero code: 40000, cause: FAILED:
>>>>>> SemanticException Failed to breakup Windowing invocations into Groups.
At
>>>>>> least 1 group must only depend on input columns. Also check for circular
>>>>>> dependencies.
>>>>>>
>>>>>> Underlying error:
>>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>>>>> arguments are expected.
>>>>>> SQLState:  42000
>>>>>> ErrorCode: 40000
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/7/17 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>>
>>>>>>> Jerome
>>>>>>>
>>>>>>> I would recommend that you try Rank function with columns from
just
>>>>>>> one table first.
>>>>>>> Once it is established that rank is working fine then add all
the
>>>>>>> joins.
>>>>>>>
>>>>>>> I am still on Hive 0.10 so cannot test it myself.
>>>>>>> However, I can find a similar issue on following link - so its
>>>>>>> possible you are facing issues due to this reported bug.
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/HIVE-4663
>>>>>>>
>>>>>>>
>>>>>>> Richa
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 16, 2013 at 6:41 PM, Jérôme Verdier <
>>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>>
>>>>>>>> You can see my query below :
>>>>>>>>
>>>>>>>> SELECT
>>>>>>>>             mag.co_magasin,
>>>>>>>>             dem.id_produit                              
   as
>>>>>>>> id_produit_orig,
>>>>>>>>             pnvente.dt_debut_commercial                 
   as
>>>>>>>> dt_debut_commercial,
>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit) 
   as
>>>>>>>> id_produit,
>>>>>>>>             RANK() OVER (PARTITION BY mag.co_magasin,
>>>>>>>> dem.id_produit ORDER BY pnvente.dt_debut_commercial DESC,
>>>>>>>> COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang
>>>>>>>>
>>>>>>>> FROM default.demarque_mag_jour dem
>>>>>>>>
>>>>>>>>           LEFT OUTER JOIN default.produit_norm pn
>>>>>>>>           ON  pn.co_societe = dem.co_societe
>>>>>>>>           AND pn.id_produit = dem.id_produit
>>>>>>>>           LEFT OUTER JOIN default.produit_norm pnvente
>>>>>>>>           ON  pnvente.co_societe = pn.co_societe
>>>>>>>>           AND pnvente.co_produit_rfu = pn.co_produit_lip
>>>>>>>>           INNER JOIN default.kpi_magasin mag
>>>>>>>>           ON mag.id_magasin = dem.id_magasin
>>>>>>>>
>>>>>>>>
>>>>>>>> GROUP BY
>>>>>>>>             mag.co_magasin,
>>>>>>>>             dem.id_produit,
>>>>>>>>             pnvente.dt_debut_commercial,
>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit);
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/7/16 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>>>>
>>>>>>>>> Can you share query with just RANK().
>>>>>>>>>
>>>>>>>>> Richa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 16, 2013 at 6:08 PM, Jérôme Verdier <
>>>>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Richa,
>>>>>>>>>>
>>>>>>>>>> I tried to execute the rank function alone, but the
result is the
>>>>>>>>>> same
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013/7/16 Richa Sharma <mailtorichasharma@gmail.com>
>>>>>>>>>>
>>>>>>>>>>> Hi Jerome
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think the problem is you are trying to use
MIN, SUM and RANK
>>>>>>>>>>> function in a single query.
>>>>>>>>>>>
>>>>>>>>>>> Try to get the rank first in a query and on top
of it apply
>>>>>>>>>>> these aggregate functions
>>>>>>>>>>>
>>>>>>>>>>> Richa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 16, 2013 at 2:15 PM, Jérôme Verdier
<
>>>>>>>>>>> verdier.jerome66@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a problem while using RANK OVER PARTITION
function with
>>>>>>>>>>>> Hive.
>>>>>>>>>>>>
>>>>>>>>>>>> Hive is in version 0.11 and, as we can see
here :
>>>>>>>>>>>> https://cwiki.apache.org/Hive/languagemanual-windowingandanalytics.html,
>>>>>>>>>>>> we can now use these functions in Hive.
>>>>>>>>>>>>
>>>>>>>>>>>> But, when i use it, i encountered this error
:
>>>>>>>>>>>>
>>>>>>>>>>>> FAILED: SemanticException Failed to breakup
Windowing
>>>>>>>>>>>> invocations into Groups. At least 1 group
must only depend on input
>>>>>>>>>>>> columns. Also check for circular dependencies.
>>>>>>>>>>>> Underlying error:
>>>>>>>>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException:
One or more
>>>>>>>>>>>> arguments are expected.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is my script :
>>>>>>>>>>>>
>>>>>>>>>>>> SELECT
>>>>>>>>>>>>             mag.co_magasin,
>>>>>>>>>>>>             dem.id_produit              
                   as
>>>>>>>>>>>> id_produit_orig,
>>>>>>>>>>>>             pnvente.dt_debut_commercial 
                   as
>>>>>>>>>>>> dt_debut_commercial,
>>>>>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)
    as
>>>>>>>>>>>> id_produit,
>>>>>>>>>>>>             min(
>>>>>>>>>>>>               CASE WHEN dem.co_validation
IS NULL THEN 0 ELSE 1
>>>>>>>>>>>> END
>>>>>>>>>>>>             )                           
                   as
>>>>>>>>>>>> flg_demarque_valide,
>>>>>>>>>>>>             sum(CASE WHEN dem.co_validation
IS NULL THEN 0 ELSE
>>>>>>>>>>>> CAST(dem.mt_revient_ope AS INT) END)
>>>>>>>>>>>>                                         
                   as
>>>>>>>>>>>> me_dem_con_prx_cs,
>>>>>>>>>>>>             0                           
                   as
>>>>>>>>>>>> me_dem_inc_prx_cs,
>>>>>>>>>>>>             0                           
                   as
>>>>>>>>>>>> me_dem_prov_stk_cs,
>>>>>>>>>>>>             sum(CASE WHEN dem.co_validation
IS NULL THEN 0 ELSE
>>>>>>>>>>>> CAST(dem.qt_demarque AS INT) END)
>>>>>>>>>>>>                                         
                   as
>>>>>>>>>>>> qt_dem_con,
>>>>>>>>>>>>             0                           
                   as
>>>>>>>>>>>> qt_dem_inc,
>>>>>>>>>>>>             0                           
                   as
>>>>>>>>>>>> qt_dem_prov_stk, -- !!!!!!!! VIRGULE
>>>>>>>>>>>>             RANK() OVER (PARTITION BY mag.co_magasin,
>>>>>>>>>>>> dem.id_produit ORDER BY pnvente.dt_debut_commercial
DESC,
>>>>>>>>>>>> COALESCE(pnvente.id_produit,dem.id_produit)
DESC) as rang
>>>>>>>>>>>>           from default.calendrier cal
>>>>>>>>>>>>           INNER JOIN default.demarque_mag_jour
dem
>>>>>>>>>>>>           ON  CASE WHEN dem.co_societe =
1 THEN 1 ELSE 2 END =
>>>>>>>>>>>> '${hiveconf:in_co_societe}' -- A modifier
>>>>>>>>>>>>           AND dem.dt_jour    = cal.dt_jour
>>>>>>>>>>>>           LEFT OUTER JOIN default.produit_norm
pn
>>>>>>>>>>>>           ON  pn.co_societe = dem.co_societe
>>>>>>>>>>>>           AND pn.id_produit = dem.id_produit
>>>>>>>>>>>>           LEFT OUTER JOIN default.produit_norm
pnvente
>>>>>>>>>>>>           ON  pnvente.co_societe = pn.co_societe
>>>>>>>>>>>>           AND pnvente.co_produit_rfu = pn.co_produit_lip
>>>>>>>>>>>>           AND pnvente.co_type_motif='05'
>>>>>>>>>>>>           INNER JOIN default.kpi_magasin
mag
>>>>>>>>>>>>           ON  mag.co_societe = '${hiveconf:in_co_societe}'
>>>>>>>>>>>>           AND mag.id_magasin = dem.id_magasin
>>>>>>>>>>>>           WHERE cal.dt_jour = '${hiveconf:in_dt_jour}'
>>>>>>>>>>>>           AND NOT (dem.co_validation IS NULL
AND cal.dt_jour >
>>>>>>>>>>>> unix_timestamp()-3*60*60*24) -- A verifier
>>>>>>>>>>>>           -- JYP 4.4
>>>>>>>>>>>>           AND dem.co_operation_magasin IN
('13','14','32')
>>>>>>>>>>>>           GROUP BY
>>>>>>>>>>>>             mag.co_magasin,
>>>>>>>>>>>>             dem.id_produit,
>>>>>>>>>>>>             pnvente.dt_debut_commercial,
>>>>>>>>>>>>             COALESCE(pnvente.id_produit,dem.id_produit)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Jérôme VERDIER*
>>>>>>>>>>>> 06.72.19.17.31
>>>>>>>>>>>> verdier.jerome66@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Jérôme VERDIER*
>>>>>>>>>> 06.72.19.17.31
>>>>>>>>>> verdier.jerome66@gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Jérôme VERDIER*
>>>>>>>> 06.72.19.17.31
>>>>>>>> verdier.jerome66@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Jérôme VERDIER*
>>>>>> 06.72.19.17.31
>>>>>> verdier.jerome66@gmail.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Jérôme VERDIER*
>>> 06.72.19.17.31
>>> verdier.jerome66@gmail.com
>>>
>>>
>>
>
>
> --
> *Jérôme VERDIER*
> 06.72.19.17.31
> verdier.jerome66@gmail.com
>
>


-- 
*Jérôme VERDIER*
06.72.19.17.31
verdier.jerome66@gmail.com

Mime
View raw message