hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Konicek <martin.koni...@gmail.com>
Subject Re: Index
Date Wed, 08 Jun 2011 11:26:57 GMT
Hi,

I was testing indexes today as well and the index definitely got used. 
You should be able to see this when you run two separate queries:
INSERT OVERWRITE DIRECTORY "/tmp/index-result2" ...
SELECT ...

The SELECT was faster for me than without the index. In your case the 
time might be spent in the GROUP BY and maybe you have little data so 
the times look the same.

What is not so good is that index can't be partitioned on different 
columns than the table. E.g. I would like to partition the table on date 
and the index on region (I can't partition the table on both date&region 
bc there are thousands of regions and that would create huge directory 
structure in HDFS, which I read is not recommended).

Martin

On 08/06/2011 11:28, Guillaume WEILL wrote:
> Hi,
>
> I want to test the use of indexes in hive. For this I created anindex, 
> I launched a first query above, I changed the settings on Hive and ran 
> my query on my database table:
>
> CREATE INDEX index ON TABLE table(key) as 'COMPACT' WITH DEFERRED REBUILD;
> ALTER INDEX index ON table REBUILD;
> INSERT OVERWRITE DIRECTORY "/tmp/index-result2" SELECT `_bucketname` ,  `_offsets` FROM
default__table_index__ x WHERE x.key=100;
>
> SET hive.index.compact.file=/tmp/index_result2;
> SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
> SELECT key, col2, sum(col3) FROM tableWHERE key=100  group BY col2;
>
>
>
> No error but I am not sure that the index is really used. Indeed I get 
> the same performance with and without the index.
> When I look at the logs (tasktracker, datanode, job_config), I see no 
> call to the directory / tmp / index_result2.
> How do I know if my index has been really used?
>
> Thanks for your help,
>
> -- 
> Guillaume WEILL

Mime
View raw message