hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amir Youssefi (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-171) Top K
Date Fri, 28 Mar 2008 04:00:24 GMT
Top K

                 Key: PIG-171
                 URL: https://issues.apache.org/jira/browse/PIG-171
             Project: Pig
          Issue Type: New Feature
            Reporter: Amir Youssefi

Frequently, users are interested on Top results (especially Top K rows) . This can be implemented
efficiently in Pig /Map Reduce settings to deliver rapid results and low Network Bandwidth/Memory
 Key point is to prune all data on the map side and keep only small set of rows with Top criteria
. We can do it in Algebraic function (combiner) with multiple value output. Only a small data-set
gets out of mapper node.

The same idea is applicable to solve variants of this problem:

  - An Algebraic Function for 'Top K Rows'
  - An Algebraic Function for 'Top K' values ('Top Rank K' and 'Top Dense Rank K')

Another words implementation is similar to combiners for aggregate functions but instead of
one value we get multiple ones. 

I will add a sample implementation for Top K Rows and possibly TOP K ORDER BY to clarify details.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message