hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillaume WEILL <>
Subject Re: Index
Date Wed, 08 Jun 2011 16:08:51 GMT
Thanks for your reply.

Could you say me how much time have you saved with the index and the time
used on a query without query? The amount of data of your table could be
helpful too.
This is to verify your point of view about the amount of data because I am
really not impressed by  performance of my index.
Personally I work on 100 GB.


2011/6/8 Martin Konicek <>

> Hi,
> I was testing indexes today as well and the index definitely got used. You
> should be able to see this when you run two separate queries:
> INSERT OVERWRITE DIRECTORY "/tmp/index-result2" ...
> SELECT ...
> The SELECT was faster for me than without the index. In your case the time
> might be spent in the GROUP BY and maybe you have little data so the times
> look the same.
> What is not so good is that index can't be partitioned on different columns
> than the table. E.g. I would like to partition the table on date and the
> index on region (I can't partition the table on both date&region bc there
> are thousands of regions and that would create huge directory structure in
> HDFS, which I read is not recommended).
> Martin
> On 08/06/2011 11:28, Guillaume WEILL wrote:
>> Hi,
>> I want to test the use of indexes in hive. For this I created anindex, I
>> launched a first query above, I changed the settings on Hive and ran my
>> query on my database table:
>> ALTER INDEX index ON table REBUILD;
>> INSERT OVERWRITE DIRECTORY "/tmp/index-result2" SELECT `_bucketname` ,
>>  `_offsets` FROM default__table_index__ x WHERE x.key=100;
>> SET hive.index.compact.file=/tmp/index_result2;
>> SET
>> hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
>> SELECT key, col2, sum(col3) FROM tableWHERE key=100  group BY col2;
>> No error but I am not sure that the index is really used. Indeed I get the
>> same performance with and without the index.
>> When I look at the logs (tasktracker, datanode, job_config), I see no call
>> to the directory / tmp / index_result2.
>> How do I know if my index has been really used?
>> Thanks for your help,
>> --
>> Guillaume WEILL

View raw message