hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerrit Jansen van Vuuren (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1526) HiveColumnarLoader Partitioning Support
Date Tue, 03 Aug 2010 10:54:16 GMT

     [ https://issues.apache.org/jira/browse/PIG-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gerrit Jansen van Vuuren updated PIG-1526:
------------------------------------------

    Attachment: PIG-1526-2.patch

The previous patch did not use the UDFContext signature which caused the partition keys and
expression to be overwritten if the loader was used for more than one table. That is fixed
now.
Also added to PathPatitionHelper filtering out of hidden files i.e. files or directories starting
with "_" are ignored now.


> HiveColumnarLoader Partitioning Support
> ---------------------------------------
>
>                 Key: PIG-1526
>                 URL: https://issues.apache.org/jira/browse/PIG-1526
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Gerrit Jansen van Vuuren
>            Assignee: Gerrit Jansen van Vuuren
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG-1526-2.patch, PIG-1526.patch
>
>
> I've made allot improvements on the HiveColumnarLoader:
> -> Added support for LoadMetadata and data path Partitioning 
> -> Improved and simplefied column loading
> Data Path Partitioning:
> Hive stores partitions as folders like to /mytable/partition1=[value]/partition2=[value].
That is the table mytable contains 2 partitions [partition1, partition2].
> The HiveColumnarLoader will scan the inputpath /mytable and add to the PigSchema the
columns partition2 and partition2. 
> These columns can then be used in filtering. 
> For example: We've got year,month,day,hour partitions in our data uploads.
> So a table might look like mytable/year=2010/month=02/day=01.
> Loading with the HiveColumnarLoader allows our pig scripts do filter by date using the
standard pig Filter operator.
> I've added 2 classes for this:
> -> PathPartitioner
> -> PathPartitionHelper
> These classes are not hive dependent and could be used by any other loader that wants
to support partitioning and helps with implementing the LoadMetadata interface.
> For this reason I though it best to put it into the package org.apache.pig.piggybank.storage.partition.
> What would be nice is in the future have the PigStorage also use these 2 classes to provide
automatic path partitioning support. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message