Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 14916C270 for ; Sat, 12 May 2012 04:05:17 +0000 (UTC) Received: (qmail 67172 invoked by uid 500); 12 May 2012 04:05:16 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 67063 invoked by uid 500); 12 May 2012 04:05:15 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 67033 invoked by uid 99); 12 May 2012 04:05:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 May 2012 04:05:14 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.48] (HELO mail-pb0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 May 2012 04:05:08 +0000 Received: by pbbrq8 with SMTP id rq8so5505965pbb.35 for ; Fri, 11 May 2012 21:04:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:mime-version:content-type:subject:date:in-reply-to:to :references:message-id:x-mailer:x-gm-message-state; bh=UIP+O0w4Mzs5JlUr1WrgxF1ohXyEI8qFn6qBZwT7mNY=; b=HJkwQGn2YPdEI7omYMLlGsukReUw/oLK3Hv4qwidumNmRgzeOSCiZizYzWK06O+qsM YFrUeDBhjlfdtE3iorUzC+ma5zlEhPXpHsiV+57rA7AxOOP2l8BR86KJSaiQSaOAckpQ Fxb3SgRVjLU8Ej4ptO+Bn3rw1JFvDjsMQXkRVMBcolJRfedWCrbXv14ZoqRXTYcOPe8A kwdlje1NfR3uqsTW4v209CQnd+m8dIPeVCU6RYpSpU6L9+isobLZ75tB9HXAn18MxGdo CvI8/Ipe2isz4V1HfBpRWuXwjyMhJDYMqnM4Pw3DCAz/splEv2+CoNNlJsnEUjFnqRsX kPcA== Received: by 10.68.222.3 with SMTP id qi3mr1438410pbc.141.1336795487582; Fri, 11 May 2012 21:04:47 -0700 (PDT) Received: from [192.168.0.104] (c-76-103-101-72.hsd1.ca.comcast.net. [76.103.101.72]) by mx.google.com with ESMTPS id pp8sm14753226pbb.21.2012.05.11.21.04.46 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 11 May 2012 21:04:46 -0700 (PDT) From: shrikanth shankar Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_165E52A4-ECDC-484A-9767-E38DB2686B02" Subject: =?utf-8?Q?Re=3A_how_to_select_without_Mapreduce_after_index_buil?= =?utf-8?Q?d=EF=BC=9F?= Date: Fri, 11 May 2012 21:04:44 -0700 In-Reply-To: <015101cd2ff3$30bea010$923be030$@com> To: user@hive.apache.org References: <026833C91E2A1146B97EF8B717408EDF2EA0405A@szxeml534-mbx.china.huawei.com> <015101cd2ff3$30bea010$923be030$@com> Message-Id: X-Mailer: Apple Mail (2.1257) X-Gm-Message-State: ALoCoQlnxziaAYkpnktbeF6bIoEu8BJ3Mye9W22IbU45Rs7Ln9wu+ORGXTPs1VfDLVbUjbk3rDhq X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_165E52A4-ECDC-484A-9767-E38DB2686B02 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 My understanding is that the scan of the index is used to remove splits = that are known not to contain matching data. If you remove enough splits = the second MR task will run much faster. The index should also be much = smaller than the base table and that MR task should be much cheaper Shrikanth On May 11, 2012, at 8:56 PM, ransom.hezhiqiang wrote: > Thanks Ashish > =20 > the query will be split into three steps after index build. > 1=E3=80=81 query from index table and get the offset. > 2=E3=80=81 Move result. > 3=E3=80=81 Get select result by offset. > So I think the query will be more slow then no index because it has = more step and has two mapreduce task in query. > =20 > So why index exist? No Performance improvements . > =20 > =20 > Best regards > Ransom. > =20 > From: Ashish Thusoo [mailto:athusoo@qubole.com]=20 > Sent: Saturday, May 12, 2012 12:18 AM > To: user@hive.apache.org > Cc: Zhaojun (Terry) > Subject: Re: how to select without Mapreduce after index build=EF=BC=9F > =20 > Indexing in Hive works through map/reduce. There are no active = components in Hive (such as the region servers in Hbase), so the way the = index is basically used is by running the map/reduce job on the table = that holds the index data to get all the relevant offsets into the main = table and then using those offsets to figure out which blocks to read = from the main table. So you will not see map/reduce go away even when = you are running queries on tables with indexes on them. >=20 > Ashish >=20 > On Thu, May 10, 2012 at 11:32 PM, Hezhiqiang (Ransom) = wrote: > I think if I create index for one table > When I excute =E2=80=9Cselect c1,c2 from tab where index_col=3D1=E2=80=9D= , should not start mapreduce > But it was start . > So how to use a index without mapreduce? > Compact index and bitmap index all was tested , all need mapreduce . --Apple-Mail=_165E52A4-ECDC-484A-9767-E38DB2686B02 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 My understanding is that the scan of the index is = used to remove splits that are known not to contain matching data. If = you remove enough splits the second MR task will run much faster. The = index should also be much smaller than the base table and that MR task = should be much = cheaper

Shrikanth
On May 11, = 2012, at 8:56 PM, ransom.hezhiqiang wrote:

 Ashish
 
the query will be split into three steps after index = build.
1=E3=80=81  query from index table and get the = offset.
2=E3=80=81  Move result.
  Get select result by offset.
So I think the query will be more = slow  then no index because it has more step and has two mapreduce = task in query.
So why = index exist? No Performance improvements .
 
 
Best = regards
Ransom.
 
 Ashish Thusoo = [mailto:athusoo@qubole.com] 
Sent: Saturday, May 12, 2012 = 12:18 AM
To: user@hive.apache.org
Cc: Zhaojun = (Terry)
Subject: Re: how to select without = Mapreduce after index build
=EF=BC=9F 

Indexing in Hive works through map/reduce. = There are no active components in Hive (such as the region servers in = Hbase), so the way the index is basically used is by running the = map/reduce job on the table that holds the index data to get all the = relevant offsets into the main table and then using those offsets to = figure out which blocks to read from the main table. So you will not see = map/reduce go away even when you are running queries on tables with = indexes on them.

Ashish


= = --Apple-Mail=_165E52A4-ECDC-484A-9767-E38DB2686B02--