hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santlal J Gupta <Santlal.Gu...@bitwiseglobal.com>
Subject RE: More than one table created at the same location
Date Thu, 01 Sep 2016 05:11:23 GMT
hi,

I also found that more than one table can be created by pointing to same location. But @Naveen
i had found some different behavior in case of count(*). Below thing i have tried . 

hive> create table test1(f1 int,f2 int) stored as orc location  '/user/xyz/warehouse/test1'
;
hive> insert into test1 values(3,4);
hive> insert into test1 values(1,2);
hive> select * from test1;
OK
3       4
1       2
hive> select count(*) from test1;
Query ID = xyz_20160901102727_df97784e-6166-4044-8ebd-dc76661bb78c
Total jobs = 1
....
OK
2

hive>  create table test2(f1 int,f2 int) stored as orc location  '/user/xyz/warehouse/test1'
;
hive> insert into test2 values(5,6);
hive> select * from test2;
OK
3       4
1       2
5       6
hive> select count(*) from test2;
Query ID = xyz_20160901102929_ba7736e0-a9bf-4f43-ace1-27c9ba0a709c
Total jobs = 1
.....
OK
3
hive> select count(*) from test1;
Query ID = xyz_20160901103131_58be44e2-fae5-4c44-88b3-40922cd55066
Total jobs = 1
.....
OK
3

 hadoop fs -ls /user/xyz/warehouse/test1
Found 3 items
-rwxr-xr-x   2 xyz xyz        246 2016-09-01 10:26 /user/xyz/warehouse/test1/000000_0
-rwxr-xr-x   2 xyz xyz        246 2016-09-01 10:26 /user/xyz/warehouse/test1/000000_0_copy_1
-rwxr-xr-x   2 xyz xyz        246 2016-09-01 10:28 /user/xyz/warehouse/test1/000000_0_copy_2

so from this, it is confirm that in hive table test2 is reading data from table test1.
So, is it expected behavior in hive or it is bug? 

Thanks
Santlal J Gupta

-----Original Message-----
From: naveen mahadevuni [mailto:nmahadevuni@gmail.com] 
Sent: Wednesday, August 31, 2016 7:42 PM
To: dev@hive.apache.org
Subject: Re: More than one table created at the same location

Hi,

I created external table, copied data files to that location and then count returns 4. It
is ambiguous, can it be documented?

hive> CREATE EXTERNAL TABLE test_ext (col1 INT, col2 INT)
    > stored as orc
    > LOCATION '/apps/hive/warehouse/ext'; OK Time taken: 9.875 seconds

hive> select count(*) from test_ext;
Query ID = root_20160831094725_14753b28-68bb-4106-89b7-45052e0cf9a1
Total jobs = 1
Launching Job 1 out of 1
....
OK
4
Time taken: 30.366 seconds, Fetched: 1 row(s)

hive> select * from test_ext;
OK
1       2
3       4
1       2
3       4
Time taken: 6.478 seconds, Fetched: 4 row(s)


On Wed, Aug 31, 2016 at 2:27 AM, Thejas Nair <thejas.nair@gmail.com> wrote:

> Naveen,
> Can you please verify if you create these tables as external tables 
> the results are correct ?
> In case of managed tables, the assumption is that there is a 1:1 
> mapping between tables and the locations and all update to the table 
> are through hive. With that assumption, it relies on stats to return 
> results in queries like count(*) .
>
>
> On Tue, Aug 30, 2016 at 4:18 AM, Abhishek Somani < 
> abhisheksomani88@gmail.com
> > wrote:
>
> > For the 2nd table(after both inserts are over), isn't the return 
> > count expected to be 4? In that case, isn't the the bug that the 
> > count was returned wrong(maybe from the stats as mentioned) rather 
> > the fact that another table was allowed to be created at the same location?
> >
> > I might be very wrong, so pardon my ignorance.
> >
> > On Tue, Aug 30, 2016 at 3:06 AM, Alan Gates <alanfgates@gmail.com>
> wrote:
> >
> > > Note that Hive doesn’t track individual files, just which 
> > > directory a table stores its files in.  So we wouldn’t expect this 
> > > to work.  The
> bug
> > is
> > > more that Hive doesn’t detect that two tables are trying to use 
> > > the
> same
> > > directory.  I’m not sure we’re anxious to fix this since it would 
> > > mean
> > when
> > > creating a table Hive would need to search all existing tables to 
> > > make
> > sure
> > > none of them are using the directory the new table wants to use.
> > >
> > > Alan.
> > >
> > > > On Aug 30, 2016, at 04:17, Sergey Shelukhin 
> > > > <sergey@hortonworks.com>
> > > wrote:
> > > >
> > > > This is a bug, or rather an unexpected usage. I suspect the 
> > > > correct
> > count
> > > > value is coming from statistics.
> > > > Can you file a JIRA?
> > > >
> > > > On 16/8/29, 00:51, "naveen mahadevuni" <nmahadevuni@gmail.com>
> wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> Is the following behavior a bug? I believe at least one part of 
> > > >> it
> is
> > a
> > > >> bug. I created two Hive tables at the same location and 
> > > >> inserted
> rows
> > in
> > > >> two tables. count(*) returns the correct count for each 
> > > >> individual
> > > table,
> > > >> but SELECT * on one tables reads the rows from other table 
> > > >> files
> too.
> > > >>
> > > >> CREATE TABLE test1 (col1 INT, col2 INT) stored as orc LOCATION 
> > > >> '/apps/hive/warehouse/test1';
> > > >>
> > > >> insert into test1 values(1,2);
> > > >> insert into test1 values(3,4);
> > > >>
> > > >> hive> select count(*) from test1;
> > > >> OK
> > > >> 2
> > > >> Time taken: 0.177 seconds, Fetched: 1 row(s)
> > > >>
> > > >>
> > > >> CREATE TABLE test2 (col1 INT, col2 INT) stored as orc LOCATION 
> > > >> '/apps/hive/warehouse/test1';
> > > >>
> > > >> insert into test2 values(1,2);
> > > >> insert into test2 values(3,4);
> > > >>
> > > >> hive> select count(*) from test2;
> > > >> OK
> > > >> 2
> > > >> Time taken: 2.683 seconds, Fetched: 1 row(s)
> > > >>
> > > >> -- SELECT * fetches 4 records where as COUNT(*) above returns 
> > > >> count
> of
> > > 2.
> > > >>
> > > >> hive> select * from test2;
> > > >> OK
> > > >> 1       2
> > > >> 3       4
> > > >> 1       2
> > > >> 3       4
> > > >> Time taken: 0.107 seconds, Fetched: 4 row(s)
> > > >> hive> select * from test1;
> > > >> OK
> > > >> 1       2
> > > >> 3       4
> > > >> 1       2
> > > >> 3       4
> > > >> Time taken: 0.054 seconds, Fetched: 4 row(s)
> > > >>
> > > >> Thanks,
> > > >> Naveen
> > > >
> > >
> > >
> >
>
Mime
View raw message