Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
From: shrikanth shankar <sshankar@qubole.com>
Mime-Version: 1.0 (Apple Message framework v1257)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_165E52A4-ECDC-484A-9767-E38DB2686B02"
Subject: =?utf-8?Q?Re=3A_how_to_select_without_Mapreduce_after_index_buil?=
 =?utf-8?Q?d=EF=BC=9F?=
Date: Fri, 11 May 2012 21:04:44 -0700
In-Reply-To: <015101cd2ff3$30bea010$923be030$@com>
To: user@hive.apache.org
References: 
 <026833C91E2A1146B97EF8B717408EDF2EA0405A@szxeml534-mbx.china.huawei.com>
 <CAOn+50LeLhg8v=oUDgjsSz00OPPf5QJJng369DPPBtUjy5hOmA@mail.gmail.com>
 <015101cd2ff3$30bea010$923be030$@com>
Message-Id: <DE464D65-C84C-48EC-AF5F-6AACA874A66A@qubole.com>


--Apple-Mail=_165E52A4-ECDC-484A-9767-E38DB2686B02
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

My understanding is that the scan of the index is used to remove splits =
that are known not to contain matching data. If you remove enough splits =
the second MR task will run much faster. The index should also be much =
smaller than the base table and that MR task should be much cheaper

Shrikanth
On May 11, 2012, at 8:56 PM, ransom.hezhiqiang wrote:

> Thanks Ashish
> =20
> the query will be split into three steps after index build.
> 1=E3=80=81  query from index table and get the offset.
> 2=E3=80=81  Move result.
> 3=E3=80=81  Get select result by offset.
> So I think the query will be more slow  then no index because it has =
more step and has two mapreduce task in query.
> =20
> So why index exist? No Performance improvements .
> =20
> =20
> Best regards
> Ransom.
> =20
> From: Ashish Thusoo [mailto:athusoo@qubole.com]=20
> Sent: Saturday, May 12, 2012 12:18 AM
> To: user@hive.apache.org
> Cc: Zhaojun (Terry)
> Subject: Re: how to select without Mapreduce after index build=EF=BC=9F
> =20
> Indexing in Hive works through map/reduce. There are no active =
components in Hive (such as the region servers in Hbase), so the way the =
index is basically used is by running the map/reduce job on the table =
that holds the index data to get all the relevant offsets into the main =
table and then using those offsets to figure out which blocks to read =
from the main table. So you will not see map/reduce go away even when =
you are running queries on tables with indexes on them.
>=20
> Ashish
>=20
> On Thu, May 10, 2012 at 11:32 PM, Hezhiqiang (Ransom) =
<ransom.hezhiqiang@huawei.com> wrote:
> I think if I  create index for one table
> When I excute =E2=80=9Cselect c1,c2 from tab where index_col=3D1=E2=80=9D=
, should not start mapreduce
> But it was start .
> So how to use a index without mapreduce?
> Compact  index and bitmap index all was tested , all need mapreduce .


--Apple-Mail=_165E52A4-ECDC-484A-9767-E38DB2686B02
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><base href=3D"x-msg://881/"></head><body style=3D"word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; ">My understanding is that the scan of the index is =
used to remove splits that are known not to contain matching data. If =
you remove enough splits the second MR task will run much faster. The =
index should also be much smaller than the base table and that MR task =
should be much =
cheaper<div><br></div><div>Shrikanth</div><div><div><div><div>On May 11, =
2012, at 8:56 PM, ransom.hezhiqiang wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; =
font-family: Helvetica; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: =
none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
lang=3D"ZH-CN" link=3D"blue" vlink=3D"purple"><div class=3D"WordSection1" =
style=3D"page: WordSection1; "><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; font-size: =
12pt; font-family: 'Times New Roman', serif; "><span lang=3D"EN-US" =
style=3D"font-size: 10.5pt; font-family: Calibri, sans-serif; =
">Thanks<span class=3D"Apple-converted-space">&nbsp;</span></span><span =
lang=3D"EN-US">Ashish<o:p></o:p></span></div><div style=3D"margin-top: =
0cm; margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; =
font-size: 12pt; font-family: 'Times New Roman', serif; "><span =
lang=3D"EN-US"><o:p>&nbsp;</o:p></span></div><div style=3D"margin-top: =
0cm; margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; =
font-size: 12pt; font-family: 'Times New Roman', serif; "><span =
lang=3D"EN-US" style=3D"font-size: 10.5pt; font-family: Calibri, =
sans-serif; ">the query will be split into three steps after index =
build.<o:p></o:p></span></div><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 18pt; margin-bottom: 0.0001pt; =
text-indent: -18pt; font-size: 12pt; font-family: 'Times New Roman', =
serif; "><span lang=3D"EN-US" style=3D"font-size: 10.5pt; font-family: =
Calibri, sans-serif; "><span>1=E3=80=81<span style=3D"font: normal =
normal normal 7pt/normal 'Times New Roman'; ">&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
lang=3D"EN-US" style=3D"font-size: 10.5pt; font-family: Calibri, =
sans-serif; ">query from index table and get the =
offset.<o:p></o:p></span></div><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 18pt; margin-bottom: 0.0001pt; =
text-indent: -18pt; font-size: 12pt; font-family: 'Times New Roman', =
serif; "><span lang=3D"EN-US" style=3D"font-size: 10.5pt; font-family: =
Calibri, sans-serif; "><span>2=E3=80=81<span style=3D"font: normal =
normal normal 7pt/normal 'Times New Roman'; ">&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
lang=3D"EN-US" style=3D"font-size: 10.5pt; font-family: Calibri, =
sans-serif; ">Move result.<o:p></o:p></span></div><div =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 18pt; =
margin-bottom: 0.0001pt; text-indent: -18pt; font-size: 12pt; =
font-family: 'Times New Roman', serif; "><span lang=3D"EN-US" =
style=3D"font-size: 10.5pt; font-family: Calibri, sans-serif; =
"><span>3=E3=80=81<span style=3D"font: normal normal normal 7pt/normal =
'Times New Roman'; ">&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span></span></span></span><span =
lang=3D"EN-US" style=3D"font-size: 10.5pt; font-family: Calibri, =
sans-serif; ">Get select result by offset.<o:p></o:p></span></div><div =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 0cm; =
margin-bottom: 0.0001pt; font-size: 12pt; font-family: 'Times New =
Roman', serif; "><span lang=3D"EN-US" style=3D"font-size: 10.5pt; =
font-family: Calibri, sans-serif; ">So I think the query will be more =
slow &nbsp;then no index because it has more step and has two mapreduce =
task in query.<o:p></o:p></span></div><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; font-size: =
12pt; font-family: 'Times New Roman', serif; "><span lang=3D"EN-US" =
style=3D"font-size: 10.5pt; font-family: Calibri, sans-serif; =
"><o:p>&nbsp;</o:p></span></div><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; font-size: =
12pt; font-family: 'Times New Roman', serif; "><span lang=3D"EN-US" =
style=3D"font-size: 10.5pt; font-family: Calibri, sans-serif; ">So why =
index exist? No Performance improvements .<o:p></o:p></span></div><div =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 0cm; =
margin-bottom: 0.0001pt; font-size: 12pt; font-family: 'Times New =
Roman', serif; "><span lang=3D"EN-US" style=3D"font-size: 10.5pt; =
font-family: Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 0cm; =
margin-bottom: 0.0001pt; font-size: 12pt; font-family: 'Times New =
Roman', serif; "><span lang=3D"EN-US" style=3D"font-size: 10.5pt; =
font-family: Calibri, sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 0cm; =
margin-bottom: 0.0001pt; font-size: 12pt; font-family: 'Times New =
Roman', serif; text-align: justify; "><span lang=3D"EN-US" =
style=3D"font-size: 10.5pt; font-family: Calibri, sans-serif; ">Best =
regards<o:p></o:p></span></div><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; font-size: =
12pt; font-family: 'Times New Roman', serif; text-align: justify; =
"><span lang=3D"EN-US" style=3D"font-size: 10.5pt; font-family: Calibri, =
sans-serif; ">Ransom.<o:p></o:p></span></div><div style=3D"margin-top: =
0cm; margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; =
font-size: 12pt; font-family: 'Times New Roman', serif; "><span =
lang=3D"EN-US" style=3D"font-size: 10.5pt; font-family: Calibri, =
sans-serif; "><o:p>&nbsp;</o:p></span></div><div =
style=3D"border-right-style: none; border-bottom-style: none; =
border-left-style: none; border-width: initial; border-color: initial; =
border-top-style: solid; border-top-color: rgb(181, 196, 223); =
border-top-width: 1pt; padding-top: 3pt; padding-right: 0cm; =
padding-bottom: 0cm; padding-left: 0cm; "><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; font-size: =
12pt; font-family: 'Times New Roman', serif; "><b><span lang=3D"EN-US" =
style=3D"font-size: 10pt; font-family: Tahoma, sans-serif; =
">From:</span></b><span lang=3D"EN-US" style=3D"font-size: 10pt; =
font-family: Tahoma, sans-serif; "><span =
class=3D"Apple-converted-space">&nbsp;</span>Ashish Thusoo =
[mailto:athusoo@qubole.com]<span =
class=3D"Apple-converted-space">&nbsp;</span><br><b>Sent:</b><span =
class=3D"Apple-converted-space">&nbsp;</span>Saturday, May 12, 2012 =
12:18 AM<br><b>To:</b><span =
class=3D"Apple-converted-space">&nbsp;</span><a =
href=3D"mailto:user@hive.apache.org">user@hive.apache.org</a><br><b>Cc:</b=
><span class=3D"Apple-converted-space">&nbsp;</span>Zhaojun =
(Terry)<br><b>Subject:</b><span =
class=3D"Apple-converted-space">&nbsp;</span>Re: how to select without =
Mapreduce after index build</span><span style=3D"font-size: 10pt; =
font-family: SimSun; ">=EF=BC=9F</span><span lang=3D"EN-US" =
style=3D"font-size: 10pt; font-family: Tahoma, sans-serif; =
"><o:p></o:p></span></div></div><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; font-size: =
12pt; font-family: 'Times New Roman', serif; "><span =
lang=3D"EN-US"><o:p>&nbsp;</o:p></span></div><p class=3D"MsoNormal" =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 0cm; =
margin-bottom: 12pt; font-size: 12pt; font-family: 'Times New Roman', =
serif; "><span lang=3D"EN-US">Indexing in Hive works through map/reduce. =
There are no active components in Hive (such as the region servers in =
Hbase), so the way the index is basically used is by running the =
map/reduce job on the table that holds the index data to get all the =
relevant offsets into the main table and then using those offsets to =
figure out which blocks to read from the main table. So you will not see =
map/reduce go away even when you are running queries on tables with =
indexes on them.<br><br>Ashish<o:p></o:p></span></p><div><div =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 0cm; =
margin-bottom: 0.0001pt; font-size: 12pt; font-family: 'Times New =
Roman', serif; "><span lang=3D"EN-US">On Thu, May 10, 2012 at 11:32 PM, =
Hezhiqiang (Ransom) &lt;<a href=3D"mailto:ransom.hezhiqiang@huawei.com" =
target=3D"_blank" style=3D"color: blue; text-decoration: underline; =
">ransom.hezhiqiang@huawei.com</a>&gt; =
wrote:<o:p></o:p></span></div><div><div><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; font-size: =
12pt; font-family: 'Times New Roman', serif; "><span lang=3D"EN-US">I =
think if I &nbsp;create index for one table<o:p></o:p></span></div><div =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 0cm; =
margin-bottom: 0.0001pt; font-size: 12pt; font-family: 'Times New =
Roman', serif; "><span lang=3D"EN-US">When I excute =E2=80=9Cselect =
c1,c2 from tab where index_col=3D1=E2=80=9D, should not start =
mapreduce<o:p></o:p></span></div><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; font-size: =
12pt; font-family: 'Times New Roman', serif; "><span lang=3D"EN-US">But =
it was start .<o:p></o:p></span></div><div style=3D"margin-top: 0cm; =
margin-right: 0cm; margin-left: 0cm; margin-bottom: 0.0001pt; font-size: =
12pt; font-family: 'Times New Roman', serif; "><span lang=3D"EN-US">So =
how to use a index without mapreduce?<o:p></o:p></span></div><div =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 0cm; =
margin-bottom: 0.0001pt; font-size: 12pt; font-family: 'Times New =
Roman', serif; "><span lang=3D"EN-US">Compact &nbsp;index and bitmap =
index all was tested , all need mapreduce =
.<o:p></o:p></span></div></div></div></div><p class=3D"MsoNormal" =
style=3D"margin-top: 0cm; margin-right: 0cm; margin-left: 0cm; =
margin-bottom: 0.0001pt; font-size: 12pt; font-family: 'Times New =
Roman', serif; "><span =
lang=3D"EN-US"></span></p></div></div></span></blockquote></div><br></div>=
</div></body></html>=

--Apple-Mail=_165E52A4-ECDC-484A-9767-E38DB2686B02--