hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: Small question
Date Thu, 04 Oct 2012 06:25:16 GMT
That's a fuzzy match (join two tables not on equality, but on one
table's column value matching a dynamically generated regex based on
another column). I don't know of efficient ways of doing that in MR,
be it Pig or Hive.. what is Hive's execution plan for that?

The only thing that comes to mind for me is a pretty fancy udf which
loads up one table completely in memory, and applies the match to all
entries as the other table is streamed through. But of course that
would be quite expensive if the lookup table is of any respectable
size.

D



On Wed, Oct 3, 2012 at 11:32 AM, J. Rottinghuis <jrottinghuis@gmail.com> wrote:
> <moved common-user@hadoop.apache.org to bcc and added user@pig.apache.org>
>
> Best asked on the Pig users list.
>
> Cheers,
>
> Joep
>
> On Wed, Oct 3, 2012 at 7:04 AM, Abhishek <abhishek.dodda1@gmail.com> wrote:
>
>> Hi all,
>>
>> Below hive query in pig latin how to do that.
>>
>> select t2.col1, t3.col2
>>
>> from table2 t2
>>
>> join table3 t3
>>
>> WHERE t3.col2 IS NOT NULL
>>
>> AND t2.col1 LIKE CONCAT(CONCAT('%',t3.col2),'%')
>>
>> Regards
>> Abhi
>>
>>
>>

Mime
View raw message