hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-732) Utility UDFs
Date Wed, 25 Mar 2009 16:39:02 GMT

    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689153#action_12689153
] 

Olga Natkovich commented on PIG-732:
------------------------------------

Ankur,

Couple of additional comments:

(1) Top N

- You assume that you are getting data in as bytearrays (for n and fieldNum. It would be better
if you assume the actual types (int) andlet Pig to do conversion for you because then your
function will be able to work with data of different types. You do that by adding getArgToFuncMapping
function. You can see the examples in other functions in the repository and also explanation
of usage in the UDF manual. This is also applicable for your second UDF.
- In the exec function, you check for 2 elements in the tuple but you are accessing
- Looks like if you inserted too many elements you will be throwing away the head of the queue.
Is that what you want? 
- You are not specifying tuple structure in your schema definition. This could be an issue
for some of your queries. 




> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long)
to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top
N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google,
AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message