tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krisztian Horvath (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TEZ-1608) TopK example
Date Thu, 23 Oct 2014 09:01:33 GMT

     [ https://issues.apache.org/jira/browse/TEZ-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Krisztian Horvath updated TEZ-1608:
-----------------------------------
    Attachment: TEZ-1608-3.patch

> TopK example
> ------------
>
>                 Key: TEZ-1608
>                 URL: https://issues.apache.org/jira/browse/TEZ-1608
>             Project: Apache Tez
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: Janos Matyas
>            Assignee: Krisztian Horvath
>         Attachments: TEZ-1608-1.patch, TEZ-1608-2.patch, TEZ-1608-3.patch
>
>
> The goal of this sample is to find the topK elements of a dataset, while guiding through
the basics of Tez (DAG creation, tokenizers, custom comparators and parallelism). 
> An example use case for top K:
>   Given a large data set in CSV format of user comments on a site listed as: userid,postid,commentid,comment,timestamp
and we are looking for the top K commenter or the posts with the most comment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message