hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1284) pig UDF is lacking XMLLoader. Plan to add the XMLLoader
Date Wed, 17 Mar 2010 00:25:27 GMT

     [ https://issues.apache.org/jira/browse/PIG-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich updated PIG-1284:
--------------------------------


Since we are planning to branch for release next Monday, 3/22, it needs to be ready to be
committed by the end of the week. Otherwise, we should schedule it for the next release.

Please, update the target version accordingly.

> pig UDF is lacking XMLLoader. Plan to add the XMLLoader
> -------------------------------------------------------
>
>                 Key: PIG-1284
>                 URL: https://issues.apache.org/jira/browse/PIG-1284
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Alok Singh
>             Fix For: 0.7.0
>
>         Attachments: pigudf_xmlLoader.patch, pigudf_xmlLoader.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi All,
>  We are planning to add the XMLLoader UDF in the piggybank repository.
> Here is the proposal with the user docs :-
>  The load function to load the XML file
>  This will implements the LoadFunc interface which is used to parse records
>  from a dataset.
>  This takes a xmlTag as the arg which it will use to split the inputdataset into
>  multiple records.
>  For example if the input xml (input.xml) is like this
>  <configuration>
>  <property>
>  <name> foobar </name>
>  <value> barfoo </value>
>  </property>
>  <ignoreProperty>
>  <name> foo </name>
>  </ignoreProperty>
>  <property>
>  <name> justname </name>
>  </property>
>  </configuration>
>  And your pig script is like this
>  --load the jar files
>  register loader.jar;
>  -- load the dataset using XMLLoader
>  -- A is the bag containing the tuple which contains one atom i.e doc see output
>  A = load '/user/aloks/pig/input.xml using loader.XMLLoader('property') as (doc:chararray);
>  --dump the result
>  dump A;
>  Then you will get the output
> (<property>
> <name> foobar </name>
> <value> barfoo </value>
> </property>)
> (<property>
> <name> justname </name>
> </property>)
> Where each () indicate one record
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message