hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-732) Utility UDFs
Date Wed, 25 Mar 2009 13:10:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689104#action_12689104
] 

Ankur commented on PIG-732:
---------------------------

Olga,
        Thanks for a quick review. 
> (1) Pig already support limit operator ....
I have a relation where I need to group by field-1 and retain top-N occurrences of field-2.
So I group by (field-1, field-2), generate counts and flattened tuple of the form (field-1,
field2, <count>). Now I again group on field-1 and just retain top-N tuples. So I actually
need to project bags of limited size. I don't think this can be done using LIMIT as it is
not allowed inside FOREACH.

> (2) Filtering UDFs are meant to be used as ....
Moved TopN and SearchQuery UDFs to  piggyBank/evaluation/util. Also moved the test cases to
the appropriate location.

> (3) Each file included needs to have Apache license header ....
Done.



> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long)
to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top
N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google,
AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message