pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Giuseppe Santoro (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3619) Provide XPath function
Date Sat, 13 Sep 2014 10:39:33 GMT

     [ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Giuseppe Santoro updated PIG-3619:
    Attachment: xpath2.patch

I have tried to use this UDF but I get some exceptions related to the Function Mapping definition.
You define here just one parameter while there are at least two mandatory parameters and one
optional. I have fixed that issue in my new patch xpath2.patch you can find attached to this
ticket. I have been running this UDF with hundreds of XPath queries and it works really well
even with the optional parameter.

> Provide XPath function
> ----------------------
>                 Key: PIG-3619
>                 URL: https://issues.apache.org/jira/browse/PIG-3619
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>            Reporter: Saad Patel
>            Assignee: Saad Patel
>             Fix For: 0.13.0
>         Attachments: xpath.patch, xpath2.patch
> Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters.
A common use case is to then extract data from those records. XPath would allow those extractions
to be done very easily. I'm  proposing a patch that adds simple XPath support as a UDF.
> Example usage of this the XPath UDF would be:
> {code}
> extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record,
> {code}
> The proposed UDF also caches the last xml document. This is helpful for improving performance
when multiple consecutive xpath extractions on the same xml document, such as the example

This message was sent by Atlassian JIRA

View raw message