hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amir Youssefi (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-171) Top K
Date Fri, 28 Mar 2008 04:00:24 GMT
Top K
-----

                 Key: PIG-171
                 URL: https://issues.apache.org/jira/browse/PIG-171
             Project: Pig
          Issue Type: New Feature
            Reporter: Amir Youssefi


Frequently, users are interested on Top results (especially Top K rows) . This can be implemented
efficiently in Pig /Map Reduce settings to deliver rapid results and low Network Bandwidth/Memory
usage.
 
 Key point is to prune all data on the map side and keep only small set of rows with Top criteria
. We can do it in Algebraic function (combiner) with multiple value output. Only a small data-set
gets out of mapper node.

The same idea is applicable to solve variants of this problem:

  - An Algebraic Function for 'Top K Rows'
  - An Algebraic Function for 'Top K' values ('Top Rank K' and 'Top Dense Rank K')
  - TOP K ORDER BY.

Another words implementation is similar to combiners for aggregate functions but instead of
one value we get multiple ones. 

I will add a sample implementation for Top K Rows and possibly TOP K ORDER BY to clarify details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message