pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: Finding records with a given prefix
Date Tue, 02 Nov 2010 17:27:40 GMT
Basically you want to join on a regular expression, correct?   
Unfortunately Map Reduce (and thus Pig) is spectacularly bad at non- 
equijoins.  Is 'prefixes' small enough to fit in memory?  If so, you  
could write a UDF that loaded it into memory and did the comparison.   
This way the join would be done in the map phase.


On Nov 2, 2010, at 10:19 AM, Joe Ciaramitaro wrote:

> Hi all,
> I have 2 data files.  One which contains a number of records, and  
> one which contains a number of prefixes.
> A = load 'data' AS (id, name)
> B = load 'prefixes' AS (prefix)
> I'd like to pull records in A whose name begins with prefix
> The prefixes are of varying lengths
> I've been scouring the documentation, but haven't figured out what  
> the best approach could be.
> Thanks for any help,
> Joe

View raw message