hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From takuti <...@git.apache.org>
Subject [GitHub] incubator-hivemall issue #108: [WIP][HIVEMALL-138] `to_ordered_map` UDAF wit...
Date Thu, 10 Aug 2017 07:38:17 GMT
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/108
  
    I tested `each_top_k`, `to_ordered_map` and `to_ordered_list` on the same MovieLens 1M
data. As we expected, `to_ordered_map` collects duplicated keys, and the number of ratings
is 3 while we launched top-10 aggregation.
    
    ```sql
    with topk as (
        select
            each_top_k(
                10, userid, rating,
                userid, movieid
            ) as (rank, rating, userid, movieid)
        from (
            select
                userid, movieid, rating
            from ratings
            cluster by userid
        ) t
    )
    select 
        count(1), collect_list(array(movieid, rating))
    from 
        topk 
    where 
        userid = 1
    ;
    ```
    
    > 10      [[527.0,5.0],[3105.0,5.0],[1270.0,5.0],[48.0,5.0],[1035.0,5.0],[1193.0,5.0],[1287.0,5.0],[2355.0,5.0],[595.0,5.0],[2804.0,5.0]]
    
    ```sql
    with topk as (
        select 
            userid, 
            to_ordered_map(rating, movieid, 10) as movies
        from
            ratings
        group by 
            userid
    )
    select 
        count(1), collect_list(array(movieid, rating))
    from 
        topk
    lateral view explode(movies) t as rating, movieid
    where 
        userid = 1
    ;
    ```
    
    > 3       [[2028,5],[1246,4],[745,3]]
    
    ```sql
    with topk as (
        select 
            userid, 
            to_ordered_list(movieid, rating, '-k 10') as movies
        from
            ratings
        group by 
            userid
    )
    select 
        size(movies), movies
    from 
        topk
    where 
        userid = 1
    ;
    ```
    
    > 10      [595,1035,3105,2355,1287,2804,1193,2028,1029,1270]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message