incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Potter <thelabd...@gmail.com>
Subject partition filter on condition determined by UDF
Date Tue, 20 Nov 2012 20:27:09 GMT
It doesn't seem like I'm able to call a UDF to determine the value of my
partition filter condition. For example, I'd like to do this within a Pig
MACRO:

DEFINE load_recent_signals(days, end_timebucket) return RECENT_SIGNALS {

  signals = load 'signals' using org.apache.hcatalog.pig.HCatLoader();

  $RECENT_SIGNALS = foreach (

    filter signals by (

     datetime_partition >= TimebucketToDatePartition($end_timebucket -
(86400000L*$num_days)) AND

     datetime_partition <= TimebucketToDatePartition($end_timebucket) AND

     relationship_id IS NOT NULL

    )) {

      generate ...;

  };

};

The TimebucketToDatePartition is a UDF that determines the partition value
(a STRING) based on a timestamp (LONG).

When I run this, I get the error that the filter couldn't be "pushed" into
the load, which makes partitioning worthless. I have big data so
partitioning is VERY important.

Of course, I also tried evaluating the UDFs when I call in the MACRO, but
of course the Pig grammar is so limited that it doesn't recognize UDF calls
to determine parameter values, i.e.

signals_in = load_recent_signals(TimebucketToDatePartition(1351612800000L),
TimebucketToDatePartition(1351785600000L));

This results in error: ERROR 1200: <line 5, column 58>  mismatched input
'(' expecting RIGHT_PAREN

So I'm at a loss as to what I can do here. Seems like evaluating a UDF for
a partition filter is a sensical thing to do with HCatalog and Pig.

I'm willing to crack open the code and fix this if someone can provide some
advice on how to go about this issue, i.e. should I try to fix the Pig
grammar to allow UDFs to be called when evaluating MACRO parameters or try
to fix the HCatalog side to allow me to call a UDF to determine filter
conditions.

<rant>So far, I've had nothing but trouble with HCatalog and filtering by
partition keys in Pig. Isn't this one of the the primary use cases of
HCatalog?</rant>

Mime
View raw message