Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
MIME-Version: 1.0
In-Reply-To: <CAJ3fcbA-EmWMwWV58XT_HdZEDgojfp5BRQTcWBGcXKhdEDj5rg@mail.gmail.com>
References: <CAKZg861j+SFw-Hhi0OGYVY_njauOpYths5yzvdhpihzMDcZ3Yg@mail.gmail.com>
	<CAJ3fcbA-EmWMwWV58XT_HdZEDgojfp5BRQTcWBGcXKhdEDj5rg@mail.gmail.com>
Date: Sat, 14 May 2016 21:38:55 +0900
Message-ID: <CAKZg861b6QsMBTCFG+Xax1Xr2kxBV7cZ44NwRWwq0hC6qoLzCg@mail.gmail.com>
Subject: Re: clustered bucket and tablesample
From: no jihun <jeesim2@gmail.com>
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=001a113d05f0b08ba40532ccac89
archived-at: Sat, 14 May 2016 12:39:02 -0000

--001a113d05f0b08ba40532ccac89
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

ah, as i mentioned
both field type of action_id and classifier is STRING. and I can not change
the type.

CREATE TABLE `X`(`action_id` string,`classifier` string)
CLUSTERED BY (action_id,classifier) INTO 256 BUCKETS
STORED AS ORC

I use two fields for hash then bucketing because each one field is not so
well distributed.

my concern is not about the strong hash source but about How can I
tablesample to the a bucket by field value what provided by 'where clause'

when I clustered by string fields which one is right for tablesample?
1. provide fields
TABLESAMPLE(BUCKET 1 OUT OF 256 ON  action_id, classifier)

2. provide values
TABLESAMPLE(BUCKET 1 OUT OF 256 ON  'aaa', 'bbb')
2016. 5. 14. =EC=98=A4=ED=9B=84 8:48=EC=97=90 "Mich Talebzadeh" <mich.taleb=
zadeh@gmail.com>=EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1:

> Is action_id can be created as a numeric column:
>
> CREATE TABLE X ( action_id bigint,  ..)
>
> Bucketing or hash partitioning best works on numeric columns with high
> cardinality (say a primary key).
>
> From my old notes:
>
> Bucketing in Hive refers to hash partitioning where a hashing function is
> applied. Likewise an RDBMS like Oracle, Hive will apply a linear hashing
> algorithm to prevent data from clustering within specific partitions.
> Hashing is very effective if the column selected for bucketing has very
> high selectivity like an ID column where selectivity (select
> count(distinct(column))/count(column) ) =3D 1.  In this case, the created
> partitions/ files will be as evenly sized as possible. In a nutshell
> bucketing is a method to get data evenly distributed over many
> partitions/files.  One should define the number of buckets by a power of
> two -- 2^n,  like 2, 4, 8, 16 etc to achieve best results. Again bucketin=
g
> will help concurrency in Hive. It may even allow a partition wise join i.=
e.
> a join between two tables that are bucketed on the same column with the
> same number of buckets (anyone has tried this?)
>
>
>
> One more things. When one defines the number of buckets at table creation
> level in Hive, the number of partitions/files will be fixed. In contrast,
> with partitioning you do not have this limitation.
>
> can you do
>
> show create table X
>
> and send the output. please.
>
>
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJ=
d6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCC=
dOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 14 May 2016 at 12:23, no jihun <jeesim2@gmail.com> wrote:
>
>> Hello.
>>
>> I want to ask the correct bucketing and tablesample way.
>>
>> There is a table X which I created by
>>
>> CREATE TABLE `X`(`action_id` string,`classifier` string)
>> CLUSTERED BY (action_id,classifier) INTO 256 BUCKETS
>> STORED AS ORC
>>
>> Then I inserted 500M of rows into X by
>>
>> set hive.enforce.bucketing=3Dtrue;
>> INSERT OVERWRITE INTO X SELECT * FROM X_RAW
>>
>> Then I want to count or search some rows with condition. roughly,
>>
>> SELECT COUNT(*) FROM X WHERE action_id=3D'aaa' AND classifier=3D'bbb'
>>
>> But I'd better to USE tablesample as I clustered X (action_id,
>> classifier). So, the better query will be
>>
>> SELECT COUNT(*) FROM X
>> TABLESAMPLE(BUCKET 1 OUT OF 256 ON  action_id, classifier)
>> WHERE action_id=3D'aaa' AND classifier=3D'bbb'
>>
>> Is there any wrong above? But I can't not find any performance gain
>> between these two query.
>>
>> query1 and RESULT( with no tablesample.)
>>
>> SELECT COUNT(*)) from X
>> WHERE action_id=3D'aaa' and classifier=3D'bbb'
>>
>>
>> ------------------------------------------------------------------------=
--------
>>         VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED
>> KILLED
>>
>> ------------------------------------------------------------------------=
--------
>> Map 1 ..........   SUCCEEDED    256        256        0        0
>> 0       0
>> Reducer 2 ......   SUCCEEDED      1          1        0        0
>> 0       0
>>
>> ------------------------------------------------------------------------=
--------
>> VERTICES: 02/02  [=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D>>] 100%  ELAPSED TIME: 15.35
>> s
>>
>> ------------------------------------------------------------------------=
--------
>> It scans full data.
>>
>> query 2 and RESULT
>>
>> SELECT COUNT(*)) from X
>> TABLESAMPLE(BUCKET 1 OUT OF 256 ON  action_id, classifier)
>> WHERE action_id=3D'aaa' and classifier=3D'bbb'
>>
>>
>> ------------------------------------------------------------------------=
--------
>>         VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED
>> KILLED
>>
>> ------------------------------------------------------------------------=
--------
>> Map 1 ..........   SUCCEEDED    256        256        0        0
>> 0       0
>> Reducer 2 ......   SUCCEEDED      1          1        0        0
>> 0       0
>>
>> ------------------------------------------------------------------------=
--------
>> VERTICES: 02/02  [=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D>>] 100%  ELAPSED TIME:
>> 15.82     s
>>
>> ------------------------------------------------------------------------=
--------
>> It ALSO scans full data.
>>
>> query 2 RESULT WHAT I EXPECTED.
>>
>> Result what I expected is something like...
>> (use 1 map and relatively faster than without tabmesample)
>>
>> ------------------------------------------------------------------------=
--------
>>         VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED
>> KILLED
>>
>> ------------------------------------------------------------------------=
--------
>> Map 1 ..........   SUCCEEDED      1          1        0        0
>> 0       0
>> Reducer 2 ......   SUCCEEDED      1          1        0        0
>> 0       0
>>
>> ------------------------------------------------------------------------=
--------
>> VERTICES: 02/02  [=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D>>] 100%  ELAPSED TIME:
>> 3.xx     s
>>
>> ------------------------------------------------------------------------=
--------
>>
>> Values of action_id and classifier are well distributed and there is no
>> skewed data.
>>
>> So I want to ask you what will be a correct query that prune and target
>> specific bucket by multiple column?
>>
>
>

--001a113d05f0b08ba40532ccac89
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">ah, as i mentioned <br>
both field type of action_id and classifier is STRING. and I can not change=
 the type.</p>
<p dir=3D"ltr">CREATE TABLE `X`(`action_id` string,`classifier` string)<br>
CLUSTERED BY (action_id,classifier) INTO 256 BUCKETS<br>
STORED AS ORC<br></p>
<p dir=3D"ltr">I use two fields for hash then bucketing because each one fi=
eld is not so well distributed.</p>
<p dir=3D"ltr">my concern is not about the strong hash source but about How=
 can I tablesample to the a bucket by field value what provided by &#39;whe=
re clause&#39;</p>
<p dir=3D"ltr">when I clustered by string fields which one is right for tab=
lesample?<br>
1. provide fields<br>
TABLESAMPLE(BUCKET 1 OUT OF 256 ON=C2=A0 action_id, classifier)</p>
<p dir=3D"ltr">2. provide values<br>
TABLESAMPLE(BUCKET 1 OUT OF 256 ON=C2=A0 &#39;aaa&#39;, &#39;bbb&#39;)</p>
<div class=3D"gmail_quote">2016. 5. 14. =EC=98=A4=ED=9B=84 8:48=EC=97=90 &q=
uot;Mich Talebzadeh&quot; &lt;<a href=3D"mailto:mich.talebzadeh@gmail.com">=
mich.talebzadeh@gmail.com</a>&gt;=EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1:<br =
type=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>I=
s action_id can be created as a numeric column:</div><div><br></div><div>CR=
EATE TABLE X (=C2=A0action_id bigint,=C2=A0 ..)</div><div><br></div><div>Bu=
cketing or hash partitioning best works on numeric columns with high cardin=
ality (say a primary key).</div><div><br></div><div>From my old notes:</div=
><div><br></div><div><font color=3D"#000000" face=3D"Times New Roman" size=
=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span lang=3D"EN-GB" style=3D"font-f=
amily:&quot;Arial&quot;,sans-serif;font-size:11pt"><font color=3D"#000000">=
Bucketing in Hive refers to hash partitioning where a
hashing function is applied. Likewise an RDBMS like Oracle, Hive will apply=
 a
linear hashing algorithm to prevent data from clustering within specific
partitions. Hashing is very effective if the column selected for bucketing =
has
very high selectivity like an ID column where selectivity (select
count(distinct(column))/count(column) ) =3D 1.=C2=A0 In this case, the crea=
ted
partitions/ files will be as evenly sized as possible. In a nutshell bucket=
ing
is a method to get data evenly distributed over many partitions/files.=C2=
=A0
One should define the number of buckets by a power of two -- 2^n,=C2=A0 lik=
e 2,
4, 8, 16 etc to achieve best results. Again bucketing will help concurrency=
 in
Hive. It may even allow a partition wise join i.e. a join between two table=
s
that are bucketed on the same column with the same number of buckets (anyon=
e
has tried this?)</font></span></p><font color=3D"#000000" face=3D"Times New=
 Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span lang=3D"EN-GB" style=3D"font-f=
amily:&quot;Arial&quot;,sans-serif;font-size:11pt"><font color=3D"#000000">=
=C2=A0</font></span></p><font color=3D"#000000" face=3D"Times New Roman" si=
ze=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span lang=3D"EN-GB" style=3D"font-f=
amily:&quot;Arial&quot;,sans-serif;font-size:11pt"><font color=3D"#000000">=
One more things. When one defines the number of
buckets at table creation level in Hive, the number of partitions/files wil=
l be
fixed. In contrast, with partitioning you do not have this limitation. </fo=
nt></span></p><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font></div><div><br></div><div>can you do </div><div><br></div><div>show =
create table X </div><div><br></div><div>and send the output. please.</div>=
<div><br></div><div><br></div><div><br></div><div>Thanks</div><div><br></di=
v></div><div class=3D"gmail_extra"><br clear=3D"all"><div><div><div dir=3D"=
ltr"><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">Dr Mich Talebzadeh</font></p><font color=3D"#000000" face=
=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif"><font color=3D"#000000" size=3D"3">LinkedIn </font></s=
pan><i><span style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10=
pt"><font color=3D"#000000">=C2=A0</font><a href=3D"https://www.linkedin.co=
m/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw" target=3D"_bla=
nk"><font color=3D"#0000ff">https://www.linkedin.com/profile/view?id=3DAAEA=
AAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw</font></a></span></i></p><font color=3D=
"#000000" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt;text-align:justify"><span style=3D"fo=
nt-family:&quot;Arial&quot;,sans-serif;font-size:10pt"><a href=3D"http://ta=
lebzadehmich.wordpress.com/" target=3D"_blank"><font color=3D"#0000ff">http=
://talebzadehmich.wordpress.com</font></a></span></p><font color=3D"#000000=
" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif;font-size:9pt"><font color=3D"#000000">=C2=A0</font></s=
pan></p><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font></div></div></div>
<br><div class=3D"gmail_quote">On 14 May 2016 at 12:23, no jihun <span dir=
=3D"ltr">&lt;<a href=3D"mailto:jeesim2@gmail.com" target=3D"_blank">jeesim2=
@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir=
=3D"ltr">Hello.</p>
<p dir=3D"ltr">I want to ask the correct bucketing and tablesample way.</p>
<p dir=3D"ltr">There is a table X which I created by</p>
<p dir=3D"ltr">CREATE TABLE `X`(`action_id` string,`classifier` string)<br>
CLUSTERED BY (action_id,classifier) INTO 256 BUCKETS<br>
STORED AS ORC<br></p>
<p dir=3D"ltr">Then I inserted 500M of rows into X by</p>
<p dir=3D"ltr">set hive.enforce.bucketing=3Dtrue;<br>
INSERT OVERWRITE INTO X SELECT * FROM X_RAW<br></p>
<p dir=3D"ltr">Then I want to count or search some rows with condition. rou=
ghly,</p>
<p dir=3D"ltr">SELECT COUNT(*) FROM X WHERE action_id=3D&#39;aaa&#39; AND c=
lassifier=3D&#39;bbb&#39;<br></p>
<p dir=3D"ltr">But I&#39;d better to USE tablesample as I clustered X (acti=
on_id, classifier). So, the better query will be</p>
<p dir=3D"ltr">SELECT COUNT(*) FROM X <br>
TABLESAMPLE(BUCKET 1 OUT OF 256 ON=C2=A0 action_id, classifier)<br>
WHERE action_id=3D&#39;aaa&#39; AND classifier=3D&#39;bbb&#39;</p>
<p dir=3D"ltr">Is there any wrong above? But I can&#39;t not find any perfo=
rmance gain between these two query.</p>
<p dir=3D"ltr">query1 and RESULT( with no tablesample.)</p>
<p dir=3D"ltr">SELECT COUNT(*)) from X <br>
WHERE action_id=3D&#39;aaa&#39; and classifier=3D&#39;bbb&#39;</p>
<p dir=3D"ltr">------------------------------------------------------------=
--------------------<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VERTICES=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 STATUS=C2=A0 TOTAL=C2=A0 COMPLETED=C2=A0 RUNNING=C2=A0 PENDING=C2=A0=
 FAILED=C2=A0 KILLED<br>
---------------------------------------------------------------------------=
-----<br>
Map 1 ..........=C2=A0=C2=A0 SUCCEEDED=C2=A0=C2=A0=C2=A0 256=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 256=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0<br>
Reducer 2 ......=C2=A0=C2=A0 SUCCEEDED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0<br>
---------------------------------------------------------------------------=
-----<br>
VERTICES: 02/02=C2=A0 [=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D&gt;&gt;] 100%=C2=A0 ELAPSED TIME: 15.35 s=C2=A0=
=C2=A0=C2=A0 <br>
---------------------------------------------------------------------------=
-----<br>
It scans full data.<br><br></p>
<p dir=3D"ltr">query 2 and RESULT</p>
<p dir=3D"ltr">SELECT COUNT(*)) from X <br>
TABLESAMPLE(BUCKET 1 OUT OF 256 ON=C2=A0 action_id, classifier)<br>
WHERE action_id=3D&#39;aaa&#39; and classifier=3D&#39;bbb&#39;</p>
<p dir=3D"ltr">------------------------------------------------------------=
--------------------<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VERTICES=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 STATUS=C2=A0 TOTAL=C2=A0 COMPLETED=C2=A0 RUNNING=C2=A0 PENDING=C2=A0=
 FAILED=C2=A0 KILLED<br>
---------------------------------------------------------------------------=
-----<br>
Map 1 ..........=C2=A0=C2=A0 SUCCEEDED=C2=A0=C2=A0=C2=A0 256=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 256=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0<br>
Reducer 2 ......=C2=A0=C2=A0 SUCCEEDED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0<br>
---------------------------------------------------------------------------=
-----<br>
VERTICES: 02/02=C2=A0 [=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D&gt;&gt;] 100%=C2=A0 ELAPSED TIME: 15.82=C2=A0=
=C2=A0=C2=A0=C2=A0 s=C2=A0=C2=A0=C2=A0 <br>
---------------------------------------------------------------------------=
-----<br>
It ALSO scans full data.<br><br></p>
<p dir=3D"ltr">query 2 RESULT WHAT I EXPECTED.</p>
<p dir=3D"ltr">Result what I expected is something like...<br>
(use 1 map and relatively faster than without tabmesample)<br>
---------------------------------------------------------------------------=
-----<br>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VERTICES=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 STATUS=C2=A0 TOTAL=C2=A0 COMPLETED=C2=A0 RUNNING=C2=A0 PENDING=C2=A0=
 FAILED=C2=A0 KILLED<br>
---------------------------------------------------------------------------=
-----<br>
Map 1 ..........=C2=A0=C2=A0 SUCCEEDED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0<br>
Reducer 2 ......=C2=A0=C2=A0 SUCCEEDED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0<br>
---------------------------------------------------------------------------=
-----<br>
VERTICES: 02/02=C2=A0 [=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D&gt;&gt;] 100%=C2=A0 ELAPSED TIME: 3.xx=C2=A0=C2=
=A0=C2=A0=C2=A0 s=C2=A0=C2=A0=C2=A0 <br>
---------------------------------------------------------------------------=
-----</p>
<p dir=3D"ltr">Values of action_id and classifier are well distributed and =
there is no skewed data.</p>
<p dir=3D"ltr">So I want to ask you what will be a correct query that prune=
 and target specific bucket by multiple column?</p>
</blockquote></div><br></div>
</blockquote></div>

--001a113d05f0b08ba40532ccac89--