hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shrikanth shankar <>
Subject Re: how to select without Mapreduce after index build?
Date Sat, 12 May 2012 04:04:44 GMT
My understanding is that the scan of the index is used to remove splits that are known not
to contain matching data. If you remove enough splits the second MR task will run much faster.
The index should also be much smaller than the base table and that MR task should be much

On May 11, 2012, at 8:56 PM, ransom.hezhiqiang wrote:

> Thanks Ashish
> the query will be split into three steps after index build.
> 1、  query from index table and get the offset.
> 2、  Move result.
> 3、  Get select result by offset.
> So I think the query will be more slow  then no index because it has more step and has
two mapreduce task in query.
> So why index exist? No Performance improvements .
> Best regards
> Ransom.
> From: Ashish Thusoo [] 
> Sent: Saturday, May 12, 2012 12:18 AM
> To:
> Cc: Zhaojun (Terry)
> Subject: Re: how to select without Mapreduce after index build?
> Indexing in Hive works through map/reduce. There are no active components in Hive (such
as the region servers in Hbase), so the way the index is basically used is by running the
map/reduce job on the table that holds the index data to get all the relevant offsets into
the main table and then using those offsets to figure out which blocks to read from the main
table. So you will not see map/reduce go away even when you are running queries on tables
with indexes on them.
> Ashish
> On Thu, May 10, 2012 at 11:32 PM, Hezhiqiang (Ransom) <>
> I think if I  create index for one table
> When I excute “select c1,c2 from tab where index_col=1”, should not start mapreduce
> But it was start .
> So how to use a index without mapreduce?
> Compact  index and bitmap index all was tested , all need mapreduce .

View raw message