pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-732) Utility UDFs
Date Wed, 25 Mar 2009 13:10:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689104#action_12689104

Ankur commented on PIG-732:

        Thanks for a quick review. 
> (1) Pig already support limit operator ....
I have a relation where I need to group by field-1 and retain top-N occurrences of field-2.
So I group by (field-1, field-2), generate counts and flattened tuple of the form (field-1,
field2, <count>). Now I again group on field-1 and just retain top-N tuples. So I actually
need to project bags of limited size. I don't think this can be done using LIMIT as it is
not allowed inside FOREACH.

> (2) Filtering UDFs are meant to be used as ....
Moved TopN and SearchQuery UDFs to  piggyBank/evaluation/util. Also moved the test cases to
the appropriate location.

> (3) Each file included needs to have Apache license header ....

> Utility UDFs 
> -------------
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long)
to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top
N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google,
AOL, Live) and extracts and normalizes the search query present in it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message