hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navis류승우 <navis....@nexr.com>
Subject Re: Pointing multiple external tables to the same location
Date Wed, 18 Dec 2013 07:30:58 GMT
Hive uses path to table(or partition) mapping internally (you can see that
in MapredWork, etc.), which might caused first table overwritten by other.

I didn't tried symlink on hdfs, which could be a solution.



2013/12/12 Petter von Dolwitz (Hem) <petter.von.dolwitz@gmail.com>

> Hi,
>
> I have declared several external tables pointing to the same location. The
> things that tells these tables apart (apart from their names) is that they
> have unique properties. These properties help me choose the correct rows
> from the underlying file. I use a single storage handler (accompanied by a
> single InputFormat and a single Serde) . The first columns in all tables
> are the same but the last (a struct) is unique and
> is constructed from the Serde (with help of the serde properties). A
> simplified version of the tables look like so:
>
> CREATE EXTERNAL TABLE Table1 (
>   column1 STRING,
>   column2 STRING)
>   STORED BY 'MyStorageHandler'
>   WITH SERDEPROPERTIES ('ser.class'='MyStructSerializationClass1')
>   LOCATION 'mylocation'
>   TBLPROPERTIES('recordreader.filter'='table1_filter');
>
> CREATE EXTERNAL TABLE Table2 (
>   column1 STRING,
>   column2 STRING)
>   STORED BY 'MyStorageHandler'
>   WITH SERDEPROPERTIES ('ser.class'='MyStructSerializationClass2')
>   LOCATION 'mylocation'
>   TBLPROPERTIES('recordreader.filter'='table2_filter');
>
>
> All works well for simple select queries towards the two tables. The
> following query gives very strange results though:
>
> SELECT * FROM (
>   SELECT column1,'Table1' FROM Table1 WHERE column2 = 'myValue'
>   union all
>   SELECT column1,'Table2' FROM Table2 WHERE column2 = 'myValue'
>   ) my_union
> ORDER BY my_union.column1
>
>
> It seems like one job task is created per file stored in the table
> location. This task gets the table properties from the second table and in
> the SerDe-step later on it seems like the records gets mixed up.
>
> I would have expected that hive would need to iterated the source files
> two times using two different tasks (with the correct table properties
> passed) in order to get this to work.
>
> Anyone here that can shed some light on this scenario?
>
> Thanks,
> Petter
>
>
>
>
>
>
>

Mime
View raw message