hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-732) Utility UDFs
Date Wed, 25 Mar 2009 10:24:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689069#action_12689069
] 

Olga Natkovich commented on PIG-732:
------------------------------------

Ankur,

Thanks for contributing UDFs to PiggyBank!

A couple of questions/comments on your patch:

(1) Pig already supports limit operator. Would that serve your needs with TopN or you actually
need to project bags of limitted size in foreach?
(2) Filtering UDFs are meant to be used as predicate in filter operators and as such should
return Boolean values. I think your TopN should be in evaluation/util group
(3) Each file included needs to have Apache license header. You can just coppy it from one
of the other files.




> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long)
to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top
N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google,
AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message