hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-732) Utility UDFs
Date Mon, 30 Mar 2009 09:22:50 GMT

     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ankur updated PIG-732:

    Attachment: udf.v3.patch

> You assume that your are getting ....
1. Fixed. Implemented getArgToFuncMapping() as suggested.

> In the exec function, you check for 2 elements ....
2. Fixed. Changed to check for 3 arguments.

> Looks like if you inserted .....
Yep that is the desired behavior.  The need is to keep top-N tuples and throw away the remaining.
The head of the queue in this case would be the minimum element.

> You are not specifying tuple structure....
The output tuple structure is really based upon the tuple structure in the input bag (field(2)).
I have changed this slightly as per my understanding. If this can be better written, please

> Utility UDFs 
> -------------
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long)
to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top
N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google,
AOL, Live) and extracts and normalizes the search query present in it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message