hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <dnsr...@gmail.com>
Subject Re: What's the best way to do Outer join and Inner join of two SequentialTextFiles using Hadoop streaming and Python ?
Date Sat, 23 Jan 2016 22:27:27 GMT
Googled, but didnot find any sample code.


On Fri, Jan 22, 2016 at 9:50 AM, Rex X <dnsring@gmail.com> wrote:

> The two SequentialTextFiles correspond to two Hive tables, say tableA and
> tableB below on
>
>     hdfs://hive/tableA/YYYY/MM/DD/*/part-00000
> and
>     hdfs://hive/tableB/YYYY/MM/DD/*/part-00000
>
> Both of them are partitioned by date, for example,
>
>     hdfs://hive/tableA/2016/01/01/*/part-00000
>
> Now we want to do a left outer join on tableA.id=tableB.id, for a date
> range, for example, from 2015/12/01 to 2016/01/09.
>
> Within Hive it is pretty easy
>
>     select * from tableA a left outer join tableB b
>     on a.id=b.id
>     where a.dt is between '20151201' and '20160109'
>     and b.dt is between '20151201' and '20160109';
>
>
> What's the best way to do Outer join and Inner join of these two
> SequentialTextFiles using Hadoop streaming and Python ?
>
> Any comments will be appreciated!
>
>
>
>

Mime
View raw message